Hacker News new | past | comments | ask | show | jobs | submit login

> Plenty of service operators would like to block every scraper that doesn't obey their robots.txt, but there's no good way to do that without blocking human traffic too (Anubis et al are okay, but they are half-measures)

Why is Anubis-type mitigations a half-measure?




Anubis, go-away, etc are great, don't get me wrong -- but what Anubis does is impose a cost on every query. The website operator is hoping that the compute will have a rate-limiting effect on scrapers while minimally impacting the user experience. It's almost like chemotherapy, in that you're poisoning everyone in the hope that the aggressive bad actors will be more severely affected than the less aggressive good actors. Even the Anubis readme calls it a nuclear option. In practice it appears to work pretty well, which is great!

It's a half-measure because:

1. You're slowing down scrapers, not blocking them. They will still scrape your site content in violation of robots.txt.

2. Scrapers with more compute than IP proxies will not be significantly bottlenecked by this.

3. This may lead to an arms race where AI companies respond by beefing up their scraping infrastructure, necessitating more difficult PoW challenges, and so on. The end result of this hypothetical would be a more inconvenient and inefficient internet for everyone, including human users.

To be clear: I think Anubis is a great tool for website operators, and one of the best self-hostable options available today. However, it's a workaround for the core problem that we can't reliably distinguish traffic from badly behaving AI scrapers from legitimate user traffic.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: