Yeah, also this means the death of archival efforts like the Internet Archive.

jeroenhd · 2025-04-19T22:09:26 1745100566

Welcome scrapers (IA, maybe Google and Bing) can publish their IP addresses and get whitelisted. Websites that want to prevent being on the Internet Archive can pretty much just ask for their website to be excluded (even retroactively).

[Cloudflare](https://developers.cloudflare.com/cache/troubleshooting/alwa...) tags the internet archive as operating from 207.241.224.0/20 and 208.70.24.0/21 so disabling the bot-prevention framework on connections from there should be enough.

realusername · 2025-04-20T02:22:27 1745115747

That's basically asking to close the market in favor of the current actors.

New actors have the right to emerge.

jeroenhd · 2025-04-20T08:10:14 1745136614

They have the right to try to convince me to let them scrape me. Most of the time they're thinly veiled data traders. I haven't seen any new company try to scrape my stuff since maybe Kagi.

Kagi is welcome to scrape from their IP addresses. Other bots that behave are fine too (Huawei and various other Chinese bots don't and I've had to put an IP block on those).

0dayz · 2025-04-20T04:39:46 1745123986

No they don't.

There's no rule that you have to let anyone in who claims to be a web crawler.

realusername · 2025-04-20T07:54:43 1745135683

So who decides that you can be one? Right now it's Cloudflare, a litteral monopoly...

The truth is that I sympathize with the people trying to use mobile connections to bypass such a cartel.

What Cloudflare is doing now is worse than the web crawlers themselves and the legality of blocking crawlers with a monopoly is dubious at best.

areyourllySorry · 2025-04-20T06:54:56 1745132096

which is why they will stop claiming to be one.

chii · 2025-04-20T07:27:24 1745134044

so what happened to competition fostering a better outcome for all then?

areyourllySorry · 2025-04-20T06:54:18 1745132058

a large chunk of internet archive's snapshots are from archiveteam, where "warriors" bring their own ips (and they crawl respectfully!). save page now is important too, but you don't realise what is useful until you lose it.

trinsic2 · 2025-04-20T01:22:39 1745112159

This sounds like it would be a good idea. Create a whitelist of IPs and block the rest.