Merged with:
Add website integration via web crawler
W
Walter
Will the website be an archive then? will this be updated? What will happen when the website disappears? Will I have a rule to either delete it in my archive as well or keep the last or a certain instance?
The style of use-case is to say "I want to have website X in my search results"
The idea is to add a URL to sources and have the app crawl the website (download all pages + linked pages). The website would then appear as a source in the app and the pages would appear among other search results.
Very nice feature. Please be careful. Index will be soft:
- Max 10 urls by minute or let the user set frequency
- Set a user-agent
- Mention IPs in use (For whitelisting)
Otherwise, the crawler will wipe out by Wordfence or Cloudfilt (I use them) or other security tools.
Activity Newest / Oldest
Leon
Status changed to: Open
Leon
Moving this back as it has been deprioritized for now
Leon
There's already a crawler - expose it for the desktop app so users can add / index their own websites
Leon
Merged with: Add website integration via web crawler
Walter
Will the website be an archive then? will this be updated? What will happen when the website disappears? Will I have a rule to either delete it in my archive as well or keep the last or a certain instance?
Stephanie Henry
Any better description of what this means exactly and how it is envisioned?
Leon
The style of use-case is to say "I want to have website X in my search results"
The idea is to add a URL to sources and have the app crawl the website (download all pages + linked pages). The website would then appear as a source in the app and the pages would appear among other search results.
Make sense?
Diane Defores
wow so cool.
Jan Sievers
Looking forward to this!
Philippe Ruaudel
Very nice feature. Please be careful. Index will be soft:
- Max 10 urls by minute or let the user set frequency
- Set a user-agent
- Mention IPs in use (For whitelisting)
Otherwise, the crawler will wipe out by Wordfence or Cloudfilt (I use them) or other security tools.
Leon
... with reasonable assumptions so we don't end up DOSing websites 😅
Leon
Status changed to: Planned