Dude, my struggle is to be able to operate a non-profit service free of charge w...

matheusmoreira · on Sept 6, 2023

Why is it relevant whether the traffic is human or automated? The whole point of the internet is you can put a server out there and anyone anywhere can connect to it with any HTTP client.

To me it seems like the only people who care about that are those who want to sell our attention to the highest bidder via advertising. Wouldn't you be having the same difficulties if there were just as much traffic coming from humans?

marginalia_nu · on Sept 6, 2023

I want to provide as many human beings as possible with value by distributing my processing power fairly between them. If I get DDoS:ed by a botnet, I won't provide anyone with anything other than optimistically an error page.

If I had infinite money and computing resources, this would be fine, but I'm just one guy with a not very power computer hosted on domestic broadband, and even though I give away compute freely, it just takes one bag of dicks with a botnet to use it all up for themselves, and without bot mitigation, I'm helpless to prevent it.

Oh and I actually do provide an API for free machine access, so it's not like they have to use headless browsers and go through the front door like this. But they still do.

Serves me right for trying to provide a useful service I guess?

TeMPOraL · on Sept 6, 2023

Arguably, the problem here is that you want to do it free of charge. That's the problem in general: adtech aside, people want to discriminate between "humans" and "bots" in order to fairly distribute resources. What should be happening though, is that every user - human and bot alike - cover their resource usage on the margin.

Tangent: there's a reason the browser is/used to be called an user agent. The web was meant to be accessed by automation. When I use a script to browse the web for me with curl, that script/curl is as much my agent as the browser is.

I see how remote attestation and other bot detection/prevention techniques make it cheaper for you to run the service the way you do. But the flip side is, those techniques will get everyone stuck using shitty, anti-ergonomic browsers and apps, whose entire UX is designed to best monetize the average person at every opportunity. In this reality, it wouldn't be possible to even start a service like yours without joining some BigCo that can handle the contractual load of interacting with every other business entity...

(Also need I remind everyone, that while the first customers of remote attestation are the DRM-ed media vendors, the second customer is your bank, and all the other banks.)

matheusmoreira · on Sept 6, 2023

> The web was meant to be accessed by automation.

Completely agree.

matheusmoreira · on Sept 6, 2023

I see. I respect that.

The bot detection won't come without cost. It will centralize power in the hands of Cloudflare and other giants. I think it's only a matter of time until they start exercising their powers. Is this really an acceptable tradeoff?

If we do accept it, I think the day will come when Cloudflare starts rejecting non-Chrome browsers, to say nothing of non-browser user agents.

marginalia_nu · on Sept 6, 2023

I don't see any good options at this point. The situation profoundly sucks for everyone involved. We're stuck between the almost absurdly adversarial open web, or bargaining with the devil at Cloudflare, and now Google's remote attestation which is basically Google taking a stab at the problem.

To be clear I don't think remote attestation is a good solution, but it's at least a solution. Any credible argument against Cloudflare or remote attestation needs to address state of the open web and have some sort of plan how to fix it. Or at least acknowledge that's what Google and CF are trying to solve. Dismissing the problem as a bunch of mindless corporate greed just doesn't fly. It affects anyone trying to host anything on the Internet, and is only getting worse. The status quo and where it's heading is completely untenable.

It's easy to say well just host static content, but that's ceding all of Internet discovery and navigation and discussion and interactivity to big tech, irreversibly pulling up the ladder on any sort of free and independent competition in these areas. That's, in my opinion, a far greater problem.

matheusmoreira · on Sept 6, 2023

Yes, I agree with you. It sucks having to make these choices and compromises. The adversarial nature of the web is difficult for service providers but it's actually ideal for users. We all benefit from being able to connect to servers using any browser, any HTTP client. This is especially true when the service providers don't like it. Software like yt-dlp is an example of software that interoperates adversarially with websites, empowering us.

I apologize if I came off as aggressive during my argument. It was not my intention. I think we reached the same conclusion though.

Maybe the true problem is bandwidth is too expensive to begin with. Would the problem still exist if the costs were negligible?

marginalia_nu · on Sept 6, 2023

Network bandwidth cost is negligible, it's hardware and processing power that's expensive. Each query I process is up to a 100 Mb disk read. I only have so much I/O bandwidth.

As far as I see it, there are two bad solutions to this problem.

The first bad solution is to have a central authority inspect most of the web's traffic and try to deduce who is human. This is the approach taken by Cloudflare, but essentially the same as Remote Attestation. It gives the chosen authority a private inspection hatch for most of the web's traffic, as well as unfettered authority to censor and deny service as they see fit.

The other bad option is a sort of 'free as in free enterprise' Ferengi Internet where each connection handshake involves haggling over the rate and then each request costs a fraction of a cent. This would remove the need to de-anonymize users, likely kill the ads business and virtually eliminate DDoS/sybil attacks. It would also be an enormous vector for money laundering, and as a cherry on top make running a search and discovery services much more expensive. I do think the crypto grifters pretty solidly killed the credibility of this option.

netdoll · on Sept 13, 2023

Xanadu comes close to this "ferengi Internet" mindset with some of the tactics it chooses for monetization of content, albeit from an entirely different angle (enabling remix culture more or less indiscriminately while preserving the sanctity of the existing copyright system and enabling royalties to flow to authors proportional to how their works are used and reused).

matheusmoreira · on Sept 6, 2023

> The other bad option is a sort of 'free as in free enterprise' Ferengi Internet where each connection handshake involves haggling over the rate and then each request costs a fraction of a cent.

> This would remove the need to de-anonymize users, likely kill the ads business and virtually eliminate DDoS/sybil attacks.

Sounds like a massive win to me on all fronts. I agree with you.

> It would also be an enormous vector for money laundering

I don't mind. If that's the price, I pay it gladly.

zb3 · on Sept 6, 2023

Is this a purposeful DDoS or just bots trying to scrape results? If this is a DDoS on purpose, what's their financial gain? Did they demand payment?

If you're talking about bots scraping content, then the question is also why. Perhaps by letting them do so, you indirectly provide even more human beings?

It's entirely possible that these questions are absurd, however, since scraping using headless browsers is not free, then there must be some reason for scraping a given service... and it's usually something that in the end benefits more human beings.

marginalia_nu · on Sept 7, 2023

Best guess is it's some attempt at blackhat SEO, to manipulate the query logs and typeahead suggestions (I don't have query logs but whatever, maybe they think I secretly forward queries to Google or something).

But really, fuck if I know. I've received no communication so I can only guess as what they're trying to do. I have a free public API they're more than welcome to use if they want to like actually use the search engine, but they still try to go through a botnet through the public web endpoint.

I've talked to a bunch of people operating other search engines and all of them are subject to this type of 24/7 DDOS. It's been going for nearly two years now.

zb3 · on Sept 7, 2023

Oh, I guess query logs could help reveal a pattern which could be throttled/blocked..

bryanrasmussen · on Sept 6, 2023

>Why is it relevant whether the traffic is human or automated?

because all traffic costs for the service provider, but the automated traffic can be run at thousands of users cheaper than it is to run one human user (who after all is bounded by time and cost of computation and bandwidth) whereas the automated is not bound by time, giving them the opportunity to DOS you - either on purpose or just accidentally.

zb3 · on Sept 6, 2023

But the solution for this is a rate limit, not captcha. The real reason they care about "human traffic" is because bots don't buy stuff.

marginalia_nu · on Sept 7, 2023

Rate limits do all of bupkis against a botnet. It's not possible to assume that each one IP or connection is one person. The crux that all of these initiatives like remote attestation are trying to solve is that as it is, one person may command tens of thousands of connections, and from a server-standpoint, there's really not much you can do to allocate resources fairly.

bryanrasmussen · on Sept 7, 2023

you're the first person to say anything about Captcha? The guy who started this argument needing some way to sort out human traffic operates a free service and is complaining the bot traffic makes it hard to offer a free service since bots cost money.

zb3 · on Sept 7, 2023

By captcha I meant "telling computers and humans apart", not necessarily a particular implementation.

Why are you focusing on bot traffic? Doesn't human traffic also cost money? Who operates bots and why?

marginalia_nu · on Sept 7, 2023

The problem is allocating resources fairly. A single human may operate tens of thousands of bots, and thus use disproportionate amount of resources, possibly all of them.