18

Support crawling or importing website pages


Avatar
Rafael Oliveira
A

Activity Newest / Oldest

Avatar

Leon

Status changed to: Open

Avatar

Leon

Moving this back as it has been deprioritized for now


Avatar

Leon

There's already a crawler - expose it for the desktop app so users can add / index their own websites


Avatar

Leon

Merged with: Add website integration via web crawler

W

Walter

Will the website be an archive then? will this be updated? What will happen when the website disappears? Will I have a rule to either delete it in my archive as well or keep the last or a certain instance?


Avatar

Stephanie Henry

Any better description of what this means exactly and how it is envisioned?


Avatar

Leon

The style of use-case is to say "I want to have website X in my search results"

The idea is to add a URL to sources and have the app crawl the website (download all pages + linked pages). The website would then appear as a source in the app and the pages would appear among other search results.

Make sense?


  • Avatar
Avatar

Diane Defores

wow so cool.


J

Jan Sievers

Looking forward to this!


Avatar

Philippe Ruaudel

Very nice feature. Please be careful. Index will be soft:
- Max 10 urls by minute or let the user set frequency
- Set a user-agent
- Mention IPs in use (For whitelisting)
Otherwise, the crawler will wipe out by Wordfence or Cloudfilt (I use them) or other security tools.


  • Avatar
  • Avatar
Avatar

Leon

... with reasonable assumptions so we don't end up DOSing websites 😅


  • Avatar
Avatar

Leon

Status changed to: Planned