The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bo
That’s true. Scrapping is a gold mine for the people that don’t know. I worked for a place which crawls the internet and beyond (fetches some internal dumps we pay for). There is no chance a zip bomb would crash the workers as there are strict timeouts and smell tests (even if a does it will crash an ECS task at worst and we will be alerted to fix that within a short time). We were as honest as it gets though, following GDPR, honoring the robots file, no spiders or scanners allowed, only home page to extract some insights.
I am aware of some big name EU non-software companies very interested in keeping an eye on some key things that are only possible with scraping.
That’s true. Scrapping is a gold mine for the people that don’t know. I worked for a place which crawls the internet and beyond (fetches some internal dumps we pay for). There is no chance a zip bomb would crash the workers as there are strict timeouts and smell tests (even if a does it will crash an ECS task at worst and we will be alerted to fix that within a short time). We were as honest as it gets though, following GDPR, honoring the robots file, no spiders or scanners allowed, only home page to extract some insights.
I am aware of some big name EU non-software companies very interested in keeping an eye on some key things that are only possible with scraping.