Hacker News new | past | comments | ask | show | jobs | submit login

Given what the "not recognized as anything in particular" User-Agent strings in the request logs to our API SaaS look like, I have a feeling that 1.27% of "browsers" could very well actually be various kinds of special-purpose scrapers that have accidentally stumbled out of their ___domain of expertise.

And as such, they may not necessarily know how to parse <html>. They could be JSON scrapers!




Good point! Would we call them browsers? Well, no, but they do speak HTTP and are bound to get picked up in the data.

I suppose that would be the remaining 0.03% though, since the 1.27% is accounted for, if we trust the upstream browser usage data.


I think the word "browser" usually doesn't appear in the w3c documents. It's a "user agent". And as long as the scraper works for a user, it fits nicely.


In my usage over the past two decades, a browser is a user agent whose output is meant to be consumed by a human. Contrast with a script or scraper, which are user agents whose output is designed to be the input to another processing technique.


A bit late, but let me try to refine this definition a bit, because I find it very interesting.

To me, a "browser" is:

1. a User-Agent that interacts with the web through a HATEOAS model (i.e. it works in terms of hypermedia, not structured data);

2. with some external actor — the "user" — being "in the loop" for at least some of those HATEOAS interactions,

3. where the "user" is expected to be an intelligent, agile actor: one who can cope with changes in the possible HATEOAS interactions, or the insertion of novel HATEOAS interactions they weren't pre-trained on, etc;

4. and where the browser, as a User-Agent, is designed to offer such intelligent, agile actors an interactive tool for understanding and examining the current HATEOAS state of a web resource — a tool which enables such actors to not just receive a static description of a HATEOAS resource, but to probe the resource, and to observe changes in the resource (remember: hypermedia includes Javascript!), allowing the user to gradually refine a decision about how they will interact with the HATEOAS resource.

5. Finally, a browser then offers a "user" the ability to execute their decided-upon HATEOAS interactions through the same interactive representational model that enabled them to learn about the HATEOAS state. The same (usually visual) representations of links that can be hovered to see the URL or alt text, can be clicked to navigate to them; the same (usually visual) representations of form fields that can be examined for labels or placeholder text, can be click-focused and targeted with keyboard input; etc. When the user makes decisions about how to interact with the browser, they're making decisions about how to interact with a coherent Human-Computer Interface that browser is exposing to the user — one where the information and the affordances co-exist in the representational model.

Why so nit-picky? Because the edge-cases are weird. Here's what I mean:

• Chrome itself is, obviously, a browser. Chrome shows a human being (or a dog, or a robot's webcam) a visual rendering of a webpage on a screen. This "user", outside the computer, looking at the screen, can poke at the page with the mouse and keyboard to understand and interrogate the state of the HATEOAS resource at hand (a document with links; a form; some kind of CAPTCHA test; etc.) The human makes a decision about how to proceed given this understanding, and tells Chrome to do it (by clicking one of the visually-represented links; focusing and typing into the visually-represented form fields; etc.)

• w3m is also a browser. Same deal as above, even though it's a character-array representation rather than pixels.

• On the other hand, headless Chrome, when driven by a Puppeteer script that visits URLs and renders out screenshots to bake thumbnail previews for a social-media site — should not be considered a browser. There's no interactivity there, no "browsing"; it's just a dumb bot-agent using Chrome for its renderer.

• Headless Chrome, when driven by a scrapy-puppeteer to extract data from a website — is almost, but not quite, a browser. Scrapy "views" pages, and browses between them! It clicks links! It presses buttons! It fills out login forms! But Scrapy isn't an intelligent, agile agent — it can't cope with changes to the site's HATEOAS model. It isn't making decisions, and it can't take advantage of the "browser as HATEOAS-resource gestalt interrogation tool." In the end, it's just using Chrome to create a trail of authentic-looking network requests, and then parsing the results, outside of Chrome, using brittle, hardcoded logic. It's a bot.

• But what if, instead of Scrapy, we put ChatGPT in the driver's seat of a headless Chrome instance, by having it speak the Chrome DevTools protocol, and then giving it a prompt to solve some high-level problem "by accessing the web"? Well, ChatGPT is an "intelligent, agile actor" by my definition — it doesn't need pre-training on how to deal with a specific website; it can respond to its workflow changing over time. And ChatGPT can see and interpret images — so it can take advantage of the "browser as HATEOAS interaction-space interrogation tool", by doing things like scrolling the viewport or positioning the virtual cursor over things, then fetching screenshots and interpreting them. So headless Chrome, in this use-case, is (acting as) a browser.

• How about headless Chrome driven by a chatbot or Alexa skill, in turn being interacted with by a human? Well, that would seemingly depend on the level of HCI fidelity that is exposed through the bot-as-proxy. If the bot only knows how to do a few programmed-in commands — and it does them by scraping data using pre-programmed models, parsing the hypermedia into structured data, and then describing that structured data to you — then no, it's not a browser. (Even though a human kicked off these interactions, and will see the final result of these interactions, those interactions are being intermediated by a system that isn't itself an intelligent, agile actor.) On the other hand, if the bot is able to be told to navigate to arbitrary URLs; and describes them by fetching screenshots and feeding them to an ML image-to-text model to conjure a description; and allows the user to tell it how to interrogate the loaded resource with commands like "hover over the red button; what do I see now?" — then the system of headless-Chrome-plus-chatbot is a browser.


I like this distinction!


I've come across browsers embedded in another app. One of them was a coupon-app. Apparently, people open these for the coupons, and have to browse somewhere, but then keep browsing inside the app. It had a weird user agent. And of course the PlayStation browser, that was another weird one. So there might be real browsers in the rest group.


Using such application-embedded browsers is a very common technique to circumvent locally-enforced network access controls.


Should they be represented in that case? Scrapers can literally cover every variable within all programmatic languages.

Special purpose is specific, and therefore not standard. I can produce a web-centric language that excludes <html> but has zero use beyond being a jackkass.

(Postscript: I don’t disagree with your point)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: