We're building a browser when it's supposed to be impossible

yellowapple · on April 11, 2023

> So instead of [building the browser one feature/spec at a time], we tend to focus on building “vertical slices” of functionality. This means setting practical, cross-cutting goals, such as “let’s get twitter.com/awesomekling to load”, “let’s get login working on discord.com”, and other similar objectives.

Seems similar to how Wine is developed: instead of just going down the list of API functions to implement, the emphasis seems more on "let's get SomeProgram.exe to run" or "let's fix the graphics glitch in SomeGame.exe". Console emulators (especially of the HLE variety) seem to have a similar flow.

Waterluvian · on April 11, 2023

I really like this approach because it’s a low-effort way to prioritize development.

I did this with the CPU for my GameBoy emulator: I picked a game I wanted to work and just kept implementing opcodes each time it crashed with an “opcode not implemented” error.

account42 · on April 12, 2023

It's also a great approach to port games - replace all platform-specific code with assert(false) until it compiles and then fix the asserts as you encounter them until everything works.

pavlov · on April 11, 2023

This approach works better for Wine where the Windows binaries are a fixed target.

On the web, you may get Twitter's feed rendering acceptably, and then two days later they ship an insignificant redesign that happens to use sixteen CSS features you don't have and everything is totally broken again.

pdpi · on April 11, 2023

The point isn’t that “getting X to work” is a one-and-done job. Rather, you’re using major websites as indicators for what features to target next, because those are largely one-and-done.

Sakos · on April 11, 2023

Even if they do change, any other site now or in the future that uses the previous functionality will work because you already implemented it.

jonhermansen · on April 11, 2023

Of course you are correct that supporting Twitter or any service is a moving target, so long as it changes. But that doesn't mean specific bugs can't be captured in a test case.

Watch this video of Andreas doing exactly this, albeit for a simpler web app https://www.youtube.com/watch?v=W4SxKWwFhA0

At one point, he deletes half the HTML file to isolate where in the site the problematic code is. In a way doing a kind of binary search. After a few iterations of this, he comes up with a very small case that exhibits the problem he's trying to solve.

It's clear he knows his way around the codebase and where to make changes, but isolating these test cases is probably as important. And presumably if you fixed enough of these issues (while following the specs), 99% of the modern web should work just fine.

naikrovek · on April 11, 2023

> On the web, you may get Twitter's feed rendering acceptably, and then two days later they ship an insignificant redesign that happens to use sixteen CSS features you don't have and everything is totally broken again.

this is not directed at you, but at this attitude which is very common and which I see all the time: everyone is lightning fast to come up with reasons that something won't work.

why?

why do people say things without understanding that almost any given problem has subproblems, and that those can be solved.

in humans, negativity is always just under the surface, and positivity is often buried deeply, and I do not understand this. I don't think I ever will. people just love to be contrarians.

Sakos · on April 11, 2023

It's really tiring. It's everywhere on HN, I deal with it everyday at work, and everywhere else I look.

Instead of it being criticism, the commenter could've seen it as a positive. Every time a site changes, you discover functionality you haven't implemented yet. Over time, you've implemented more and more. It's progress. Progress is good. Choosing the negative interpretation is so endemic and arbitrary and simply unnecessary.

giantrobot · on April 11, 2023

If you have infinite time and resources with perfect communication/understanding you can solve a lot of engineering problems. No one has that. This is where the original quoted claim from the article comes from, "building a web browser is impossible". That's encoding a lot of experience and reality of the Brobdingnagian challenge of building a web browser from scratch on 2023's web.

It's not negativity to point out a downside to an approach to a particular problem. It's potentially useful to get feedback on a development approach. Constructive criticism is very important in engineering projects because it encodes assumptions of limitations we all have.

A positive statement like "oh those sub problems are solvable!" doesn't really provide any help. No shit the problems are solvable in a perfect world. Such statements aren't even necessarily constructive because they don't offer any analysis or advice. It smacks of toxic positivity[0].

[0] https://en.wikipedia.org/wiki/Toxic_positivity

naikrovek · on April 13, 2023

not all positivity is toxic positivity, you know. I wasn't even positive, I was just anti-negative. being against negativity is not the same as being positive at all.

spouting out a problem you foresee being revealed after another problem is solved is not constructive criticism, it is reactionary and attention-seeking.

my comment is about comments like yours; unlimited time and energy to mention anything that makes what I say sound bad, improbable or difficult, and zero time or energy to even entertain the idea that my point of view is valid, and worth considering.

toxic negativity.

travisjungroth · on April 11, 2023

Can you imagine some situations where this strategy would have positive value?

t-3 · on April 11, 2023

It's just an effort to enforce social conformity. A specific case of this type of browbeating may not be helpful, but on the whole it's often positive for the group to be mostly uniform. It's also often negative! More a value-neutral standard human tribal grouping behavior than anything.

bogeholm · on April 11, 2023

With regards to the negative interpretation having positive value, yes - working in a company that values ‘not failing in particular cases’ higher than ‘working in general, room for improvement’.

For example in a conservative corporation where a given project requires a ‘go’ from several departments were the success of the project does not give an immediate advantage to those departments, but a failure will require them to explain why they didn’t ‘catch it in review’.

Not an example to follow, but pretty common IME.

actionfromafar · on April 11, 2023

Depends on if your actual goal was getting twitter to work.

If Twitter rendered fine, chances are some other site render fine today. The same it was with Wine.

ilyt · on April 11, 2023

Sure but Wine is a tool where if 90% of the stuff user wants to run in it works, it's still great. Say if you use it for gaming you can play most of the games and only boot into windows every few months once you hit one you can't.

But that's not how you use web. If 90% of the pages worked in browser I wouldn't use that browser, ever, because chances are I'd hit one that didn't at least once every few days.

mcbits · on April 11, 2023

It depends on the types of bugs and inconsistencies that show up in practice. If a page (or worse, the whole browser) crashes, that's a problem. If a column of ads doesn't scale right and gets bumped down to become its own lonely row below the main content, that's kind of ugly but almost a feature.

ilyt · on April 11, 2023

I've seen pages where buttons outright not worked. In firefox.

phone8675309 · on April 11, 2023

> But that's not how you use web. If 90% of the pages worked in browser I wouldn't use that browser, ever, because chances are I'd hit one that didn't at least once every few days.

This is a fact that Microsoft understood and pushed when they tried to get people to build pages for IE instead of working across both Navigator and IE.

spookie · on April 11, 2023

I come across a variety of sites that show unusual behaviour every day. And as others pointed out, the page is mostly degraded, not unusable.

There are many grievances when using the web today, some are down to the lack of a set CSS spec. Others due to the complete and utter disregard for browser compatibility. I'm not going to tackle the monopoly of Chrome as a browser. However, there are a number of specific uses of this collection of ever changing specs and implementations that eventually lead to page-breakage in every new browser release.

The web is complex to tackle because everyone seems to think that they've a better idea of what a page is. Some of it is fair, some of it unfair. Nevertheless I would take this approach any day.

cxr · on April 11, 2023

> If 90% of the pages worked in browser I wouldn't use that browser, ever

They don't care.

account42 · on April 12, 2023

Nor should they. If 100% Chrome compatibility is the only goal that matters for someone ... then there is Chrome for them.

sublinear · on April 11, 2023

I completely agree. The people advocating for this type of development style are in another universe. Web is incredibly fragile.

There's a vast difference between a page being degraded by all browsers in a consistent manner per W3C specs (especially the critical parts of a webpage such as JS execution or malformed HTML) vs the damn thing breaking in such a unique way that the web devs will never be able to fix the page for this new browser while getting it to work the same for others. Worst case would be security is compromised and that is a very long list of things to implement in both the HTTP layer and browser behavior before you even get started trying to render a page.

account42 · on April 12, 2023

Web devs shouldn't be fixing pages for individual browsers but instead using features conservatively and degrading gracefully where features are not available. Browser bugs shuld be fixed in the browser.

And for 99% of websites security doesn't matter at all because it's just a one-off visit to read an article or look at some funny pictures without any user account.

FastEatSlow · on April 11, 2023

Web browsers tend to degrade a webpage rather than fail to load it. Given that every day using Chrome I'll come across a website that is having issues with rendering, it should be acceptable.

peepee1982 · on April 11, 2023

You can then just set the same goal again. I don't see the problem.

Gys · on April 11, 2023

[flagged]

Brian_K_White · on April 11, 2023

No wise old pro who was actually worth listening to I ever met in any field ever says things like "my young Padawan". Saying that makes you look like two kids in a trenchcoat trying to pretend to be an adult.

Which in this case is not inconsistent with this apparent misunderstanding of what moving goalposts means.

To use your own silly words, targeting the features that are the most used is litterally one way to choose challenges wisely.

account42 · on April 12, 2023

In general the Windows binaries are not really a fixed target for Wine either - many applications (e.g. multiplayer games) have online components that require you to run the latest version. And even for others, people will demand the latest version work. Fixing things that are actually work first is still a good way to prioritize things - it's not like you stop implementing functionality once one target program works, you just move to another one. And if you run out of interesting programs then you can look at 100% API coverage.

muyuu · on April 11, 2023

yep, even better in unmaintained consoles, where the popular demand for concrete binaries is pretty much set in stone and you can even aim to complete the entire library of software binaries over time - as they're usually in the hundreds or low thousands rather than millions or billions

however the aim to build a reasonably sized and not ossified-to-previous-spec web browser is very interesting, especially if it's well engineered and made to be portable

jgerrish · on April 12, 2023

> Console emulators (especially of the HLE variety) seem to have a similar flow.

Good insight.

I was going through this same spiel in my head the other day.

It's a flow that if properly managed can provide a good feedback system. It provides the developer positive feedback and at the same time successful milestones.

Say I'm building an emulator for a simple architecture with a few dozen opcodes...

"Alright. Let's start. Where do I start? How about NOP." So you implement NOP. You write some tests for it. Maybe you build a pretty printer into your opcode and you test it on disassembling a single byte file with a single NOP opcode.

Suddenly you have a working dissassembler! It's obviously an artificial toy, but it works.

Maybe next you add an INC instruction. Add some tests. You'll need registers...

Build a simple one INC opcode binary file. Maybe add an executor in addition to a dissassembler. Suddenly you've got registers working. And if if add another INC opcode byte, you can see your emulator changing behavior based on real external input!

And so on. It's an interesting flow, you're right.

alex_lav · on April 11, 2023

This is how most games are developed as well. https://www.whatgamesare.com/vertical-slice.html

Mavvie · on April 11, 2023

I don't think that link supports your comment. It says that vertical slices (at least as described by that article) are generally unrealistic in game dev.

alex_lav · on April 11, 2023

You're right! I skimmed the first paragraph, as I was mostly just looking for a description of vertical slices in games. The rest of the content does not in fact validate my statement.

favorited · on April 11, 2023

You're right that the "vertical slice" approach has been used very successfully in games development, though. Mark Cerny[0] has been evangelizing[1] the idea that preproduction isn't over until you have a "publishable first playable" (a.k.a., a complete vertical slice) of your game.

Basically, you shouldn't switch from "preproduction" to "production" until you can show (A) here is actual gameplay, (B) it is, in fact, fun, and (C) you know how to actually implement it. Until you can demonstrate those things, how are you supposed to estimate how long it will take to build? Or that, once you're done, whatever gameplay mechanics you dreamed up are actually entertaining to a player?

[0] arcade game programmer, producer / studio exec who got Insomniac and Naughty Dog to scale beyond their founders, & eventual lead architect of the PS4 and PS5

[1] https://www.slideshare.net/holtt/cerny-method, https://www.youtube.com/watch?v=QOAW9ioWAvE

alex_lav · on April 12, 2023

> You're right that the "vertical slice" approach has been used very successfully in games development, though.

Oh I know, I worked in game development for years :). Was really just trying to get a short description.

qikInNdOutReply · on April 11, 2023

Do we really pretend this is a architecture decision? This is the classic Agile Management wants to show result for reward fast, that leads to huge tech debt build up, as layers are not properly designed and reuseable.

asimpletune · on April 11, 2023

It’s neither. There’s no management or promotions or really any incentive to do this other than it’s fun. It’s also not an architectural decision, just how they as a team decide what to work on.

cudgy · on April 11, 2023

But how does developing a half-baked browser that targets some websites for fun refute that building a browser is impossible? Doesn’t it provide another example that it is impossible, at least for this team?

asimpletune · on April 11, 2023

Well, it takes time to make something big. And, one way to do it is to choose end-to-end functionality. Idk, it doesn’t seem so controversial to me. I’d wait and see before calling it half-baked.

HelloNurse · on April 11, 2023

A vertical slice of the properly integrated features needed by some practical use case is certainly more efficient and effective than implementing the whole of a large API (e.g. some fancy recent, unproven and unstable CSS module) "in a vacuum", getting numerous rare cases wrong, and struggling to test the new features.

ludston · on April 11, 2023

Doesn't that depend on whether or not they end up making a browser in the end?

paulryanrogers · on April 11, 2023

Where is the end? And why claim you're doing the impossible until it's done?

It all comes off as puffery to me. Though the vertical slices approach is interesting, so I salute their efforts.

debevv · on April 11, 2023

Looks like regular TDD to me but tests are well defined real world use cases

account42 · on April 12, 2023

I guess technically the term test driven development does not specify unit-test driven development.

gnulinux · on April 12, 2023

This is a common misconception. I practice TDD at work and side-projects and find it very productive. TDD is best done with end-to-end tests (or automated integration tests, whatever you wanna call it). You write an end-to-end test (I give input A to the entire system, expect output B), first the test fails (because it's unimplemented) and then you implement and it passes.

It works because then your tests become the spec of the system as well, but if you only write unittests there is no spec of the system, only modules of your code. Which is not useful because if you refactor your code and change this module you need to rewrite the test. Whereas in TDD your tests should never be rewritten unless spec changes (new features added, new realization of bugs etc). This way "refactor == change code + make sure tests still pass".

You're of course free to write unittests as well, when you see fit, and there is no need to target a religious X% coverage rate at all. I think coverage targets are cargo-culted, unnecessary, time-consuming and religious. The crucial thing is, while you're writing new code (i.e. implementing a new feature or solving a bug) you need to write an automated test that says "if I do X I expect Y", see it fail, then see it pass, such that "if I do X I expect Y" is a generic claim about the system that will not change in the future unless the expectation from the system changes.

In other words, the example in this comment chain: "run a game, see 'opcode X doesn't exist', implement X, rinse repeat" is actually how TDD is supposed to work.

unixgoddess · on April 11, 2023

that's not true because their architecture follows closely the spec, so it couldn't be any better

ohgodplsno · on April 11, 2023

>Console emulators (especially of the HLE variety) seem to have a similar flow.

And emulators that have gone this path have all regretted it, because they end up making hacks to make <popular game> work, because everyone simply wants to play <popular game>. Dolphin is still paying the price of that method years down the line. Project64 took years to unfuck themselves up, ZSNES is forgotten and overtaken by many more that have done the proper thing.

So, sure, you can get some initial usage. But making a browser isn't about being able to open twitter.com

troad · on April 11, 2023

I feel like you may have misunderstood what kling is saying. He's not saying "we will cut any corner to get Twitter to load", he's saying "we look at what it would take to get Twitter to load, read through the relevant specs, and try our best to implement the required features cleanly and correctly".

Loading Twitter (etc.) is not really the goal, it's more of a prioritisation mechanism for tackling a huge spec. Actually getting Twitter to run is a nice reward for all that hard work though, and a series of such rewards keeps the contributors motivated in the marathon that is building a browser.

ohgodplsno · on April 11, 2023

No, I understand what Andreas is saying. But the reality is, when you read and implement specs for a specific website, you end up cutting corners, even accidentally. Maybe Twitter relies on a particular behavior of fetch() that was screwed up in Chrome 97 and has had to be kept for backwards compat this entire time. Maybe it uses some CSS that never got properly documented or specified.

By targeting a single website, you end up accidentally writing in those site-specific fixes _in_ your implementation. You only realize it's fucked up because you visited Twitter. but maybe it screws up another site. Maybe something else depends on a quarter of that functionality, and you've accidentally broken it.

vanviegen · on April 11, 2023

It seems that GP already addressed your concern: "read through the relevant specs, and try our best to implement the required features cleanly and correctly"

Of course some errors will be made along the way. That's to be expected, regardless of the approach taken.

jonfw · on April 11, 2023

Maybe so. Hopefully you’d find that out while looking at the spec for that particular function. If not, you may have to rewrite some code as you learn more. Life goes on

com2kid · on April 11, 2023

> ZSNES is forgotten and overtaken by many more that have done the proper thing.

When ZSNES did its thing, which, just to remind people, was back when Pentium IIs ruled the roost and CPUs topped out at 450mhz, doing things the proper way was not a choice because the proper way needed 3x the CPU power of doing things the ZSNES way that worked.

Dolphin became popular because it could actually play games, and I bet if they had instead spent an extra 3 or 4 years working on code that was "correct" without releasing anything, well odds are they wouldn't have such a large following and would not have attracted so many contributors.

Users do not benefit from "perfect code" that they never get to use because it is still in development.

chrismorgan · on April 11, 2023

Since “The reckless, infinite scope of web browsers” is depicted at the start of the article, I think it’s worth pointing out that its claim of W3C having 1,217 specs totalling 114 million words is wildly wrong, probably by 2–3 orders of magnitude in the total. The considerable majority of the documents considered were not specs or not web-relevant, and dozens of versions of the same thing were often counted. Source: https://news.ycombinator.com/item?id=22617721.

titzer · on April 11, 2023

> probably by 2–3 orders of magnitude

So the web is maybe only 1.2 specs with 114,000 words? I think it's considerably more than that. If that estimate is off, it's by no more than a factor of 10, IMO. No need to exaggerate.

snakeboy · on April 11, 2023

In the source comment he gives it's more explicit, but the "2-3 orders of magnitude" is referring to "the total", i.e. the 114 million words, not the number of specs.

illiarian · on April 11, 2023

Worth noting the discussion at the link.

Given that he omitted huge specs like WebGL etc. I wouldn't say it's wildly wrong. But I'd love to somehow arrive at a better estimate.

hanselot · on April 11, 2023

Given that someone that needs to develop a browser probably needs to hunt and peck through all that trash to find the relevant bits of information, is it not actually more damning that such a vast quantity of irrelevant cruft exists?

Is this not the corporate equivalent of creating a walled garden (perhaps not the right phrase here, gastric moat sounds more apt), by exhausting the resources of all that should choose to attempt to scale this mountain of junk?

That being said, I can't make any suggestions as to how you could shortcut through that other than just having decades of experience in the field.

arp242 · on April 11, 2023

Isn't that true for any system that's been around for a few decades? Try implementing XMPP; which XEPs do you pick? It's a long list.[1] Try implementing email: there's probably more RFCs to exclude than include at this point, and what do you need and what is optional?

[1]: https://xmpp.org/extensions/

im3w1l · on April 11, 2023

This is in fact one of the big issues with xmpp. Everything is sorta-kinda compatible but not really. And email is so getting so complicated that many people are scared of running their own server let alone programming one.

jssfr · on April 11, 2023

The Compliance Suites (also linked on the top of the page you linked) are intended to provide some guidance about what's important.

The current edition of those can be found in XEP-0459: https://xmpp.org/extensions/xep-0459.html

nicoburns · on April 11, 2023

One thing the web specs do incredibly well is cross-linking. I've found it quite easy to start with a high-level spec (e.g. flexbox) and drill down into the bits I need because anywhere another spec is referenced it's linked to directly.

chrismorgan · on April 11, 2023

I believe it’s still wildly wrong. Had arp242 not spoken up at that time, I’d have been saying something similar, because the numbers were to me blindingly obviously extremely unrealistic. The entire HTML Standard (which is somewhat of a misnomer now, it covers much more than just HTML, quite a bit of CSS interactions, other web platform functionality, JavaScript APIs, and the likes) is now about half a million words by the most generous of counting methods (it’s written in a fairly verbose style, which is a really really really good thing when you compare it to the average IETF RFC), and I suspect it’s bigger than everything else put together, apart from ECMAScript (around 270,000 words¹ and probably growing at a faster rate than all other specs: it’s written in an even more verbose style, most of which is effectively straight code in prose form, whereas in the HTML Standard “straight code” is only a decent chunk of it).

As for WebGL, the WebGL parts are actually quite little. https://registry.khronos.org/webgl/specs/latest/1.0/ is only about 20,000 words. I gather it defers significantly to GLES20 (PDF, 204 pages, ~60,000 words), and GLES20GLSL (PDF, 119 pages, ~30,000 words), and it has GL32CORE in its references (PDF, 404 pages, ~125,000 words), but doesn’t actually use cite it in the text and I don’t know if it’s relevant. There doesn’t look to be anything else significant that wouldn’t already be included.

But really, WebGL is a fairly thin layer atop OpenGL ES 2.0, just removing some functionality and applying some restrictions. I believe you would reasonably expect a browser to use an existing OpenGL ES 2.0 implementation, so I’d be quite content to exclude the 90,000 (or perhaps it’s ~215,000?) words of that, just like it’s common to reuse an existing JavaScript engine (though you also don’t have to). Yet note this: it seems that even if we include it all (and presuming I haven’t missed anything, which I admit I could easily have done, I’m not conversant with these specs like I am with HTML/CSS/JS specs), it’s still under 0.2% of Drew’s massively-inflated figure.

—⁂—

¹ Whew, https://262.ecma-international.org/ took me several minutes to download, despite being only 7MB. Sigh; the trials of being in Australia, where things hosted in the USA are often inexplicably painfully slow—like, sub-256kbps. When already downloaded, it renders in under four seconds, which is really fairly impressive when it’s doing all that layout on a document a million pixels tall—this ain’t a PDF where you can only render one page at a time. The HTML Standard is almost two million pixels tall, and also loads completely in under four seconds—simpler styles, perhaps? I refer to it often enough that I build it locally so I don’t have to download its 13MB all the time, or compromise with the multipage version that you can’t search through as easily.

cxr · on April 11, 2023

To underscore something that might get lost in the wall:

The entire premise given in Reckless, Infinite Scope is that the number of words in the specification is positively correlated with the intractability of implementing a given thing. From this foregone conclusion, it tries to quantify how much worse the task of implementing a Web browser is. The problem is that that the premise is a bad one; even if it takes more time to read a wordier spec, it is easier to implement one that describes well-defined behavior than a terse one that glosses over things and leaves huge gaps of undefined behavior. This is not just conjecture—it tracks with the development and progress of implementing, say, the HTML parsing algorithm; it is easier to implement a correct and acceptable HTML reader in 2023 armed with only the spec than it was to try to do the same thing in 2003 which involved reading the spec and also reverse engineering how other (esp. proprietary) browsers deal with the pages that you find authors actually publishing in the wild. This is a task that was made easier because the standard got bigger.

The point is that its broken methodology doesn't even matter; we don't have to try to come up with better ways of evaluating whether a spec should be included or not because its whole premise is flawed to begin with. Any attempt to produce an input set that you can then use to run a word count analysis is a moot academic exercise at best that will only tell you how many words it contains.

illiarian · on April 11, 2023

And yet, it's a decent proxy. That, and the number of specs required (many of whom are locked behind IEEE, IEC etc. paywalls)

cxr · on April 11, 2023

Uh, no. It's a bad proxy, for the reasons just stated. As the spec gets more detailed, it gets easier to implement, not harder.

giantrobot · on April 11, 2023

A more detailed spec might have more concrete definitions but it also means more actual code for someone to write. An under-detailed spec you might have a case switch with a couple defined values and an undefined catch all. A super detailed spec just adds case statements and requires more code to handle them. The detail in the spec makes for lower cognitive load but the code still needs to be written and ideally tests written.

cxr · on April 11, 2023

> A more detailed spec [...] means more actual code for someone to write.

No, it doesn't. A detailed spec has the same amount of code to write as a spec for the same thing with less detail; for the types of specs relevant to this discussion, the primary requirement of "does what the other browsers do" exists whether the details are made explicit in the spec or not. More code is a consequence of an increase in requirements, not detail.

In any case, neither circumstance is I/O bound to begin with.

giantrobot · on April 11, 2023

No. If a spec doesn't define a behavior code can jump to some "undefined" handler which could be anything from a no-op to some quirks mode. Unless you're Microsoft writing specs "do what Word 97 does", copying the behavior of existing browsers is not a specification.

cxr · on April 11, 2023

Please don't ignore the context. We are talking about Web browsers.

You don't, in reality, have the latitude to do "anything from a no-op to some quirks mode" of your choice. The requirement is absolutely the one stated: to be compatible with what other browsers are doing. If your browser doesn't satisfy that requirement, then you break the Web, regardless of whether the spec is a hundred words or a hundred million. No amount of pointing at a standard and arguing that it doesn't specify clearly defined behavior in some area will ever be enough to teach a site to be able to say, "Oh, I'll just unbreak myself then so you can go ahead and view/use this page on your computer."

Besides that, even if you were right—and to be clear, you aren't—that doesn't change the fact that, again, arguing for underspecification because "a couple defined values" isn't as much "actual code" that "still needs to be written" is an argument that approaches a problem that isn't I/O bound as if it is.

illiarian · on April 11, 2023

> and I suspect it’s bigger than everything else put together, apart from ECMAScript

The thing is, it's not just the HTML standard. It's also all the standards it references. And all the standards they reference, and all the standards those standards reference, ad infinitum.

For example, HTML 5 references SVG 2 which references CSS 2 which references Unicode and XML 11. Or, to go the same route, HTML 5 references SVG 2 which references CSS 2 which references CC.1:2004-10 (Profile version 4.2.0.0) Image technology colour management which references (normative) ISO/IEC 646:1991, Information technology — ISO 7-bit coded character set for information interchange, IEC 61966-2-1 (1999-10), Multimedia systems and equipment — Colour measurement and management — Part 2-1: Colour management — Default RGB colour space — sRGB and TIFF 6.0 Specification, Adobe Systems Incorporated among other things.

Yes, some of those overlap (as many standards will reference many the same standards), but the number of those standards is definitely non-trivial. Some of them you can probably pull in as system libraries or external libraries. The question is, how many?

Edit: and some of them are definitely not relevant to the web, but how would you know until you read through the spec that references it, and through the referenced spec to find and understand the relevant bits?

arp242 · on April 11, 2023

Essentially any specification that includes any kind of image support will include this kind of chain of specifications; just as any system that does networking will eventually end up with TCP, any system that does text ends up with Unicode, etc. Even the simplest possible 1995-esque browser will have to deal with that (support for images was added in 1993, and text and networking were always central).

illiarian · on April 11, 2023

> Even the simplest possible 1995-esque browser will have to deal with that (support for images was added in 1993, and text and networking were always central).

Indeed they did. Here's what author of KHTML said, https://twitter.com/LarsKnoll/status/1421121639845187585

--- start quote ---

Implementing a browser engine from scratch was a lot of work in 1999/2000, it’s close to impossible today.

--- end quote ---

philistine · on April 11, 2023

To make a web browser from scratch is like making a hamburger from scratch. The problem is not about the first part but what you truly mean by from scratch.

mananaysiempre · on April 11, 2023

ISO 646 and 61966? I won’t disagree with your annoyance with ISO water torture[1], but ASCII and sRGB are not the examples of needlessly sprawling web of references I would’ve chosen. Even if sRGB is an utter mess[2], it’s a mess you essentially have to use if you’re doing colour on computers.

[1] https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt

[2] https://photosauce.net/blog/post/what-makes-srgb-a-special-c...

illiarian · on April 11, 2023

I just randomly selected some without going too deep into details. But yes, sRGB is also referenced from CSS because, you guessed it, CSS deals with color :)

mardifoufs · on April 11, 2023

How do you even start implementimg such a spec? I know it's probably a dumb question but how would one structure their code to check all those boxes? Does it usually involve reading the entire spec, then figuring out the foundational parts and building from there? Does that work when you need multiple specs to "fit" together? How do you make sure or even check that some code you built for some part of the spec does not interfere with something else?

I implemented a few specs in my short career but nothing even close to that. It's actually mind boggling that we manage to have all those moving parts fit together.

chrismorgan · on April 11, 2023

Other than layout and rendering, implementing HTML, ECMAScript and CSS is genuinely easy. There’s a lot of it so that it’ll take you a long time, but it’s very much not hard, because the HTML and ECMAScript specs fully spell out the algorithm, telling you exactly what you must do (or, more precisely, what you must be equivalent to doing: e.g. “implementations must act as if they used the following state machine to tokenize HTML”), so it’s largely mechanical. This is very unusual in specs. I wish it were less so.

Take a look through https://html.spec.whatwg.org/multipage/parsing.html. It’s verbose but very approachable, very implementable.

arp242 · on April 11, 2023

The question is what exactly is needed for a useful and functional browser. You certainly don't need all features from Chrome, but you do need more than, say, Lynx or Dillo.

Is WebGL needed? I've browsed the web for years with it disabled and have not suffered any inconvenience. I'd probably say it's not needed, but I'm a bit on the fence about it and can understand if people would disagree. All browsers implement XSLT, but is that actually needed for a functional modern browser? Maybe not? I can't remember the last time I've seen it used, but perhaps it is. And do you include HTTP? Or is that too low-level? Do you include PNG and SVG or just PNG? If you include SVG then why not PNG?

There are some obvious "we need this", some obvious "we don't need this", and a lot of unclear and somewhat subjective area. I do know that you can't really say "yes there's bad data, but it probably cancels out against stuff omitted"; if anything, it only underscored my point that the list is not good.

An uncurated or minimally curated document dump is not the correct approach in the first place, if you do that for SMTP you'd end up with a lot of irrelevant documents too simply because the specification is a few decades old and stuff gets superseded, some things never sees real-world implementations, things no one uses any more, etc.

I started making a better list when the article was originally posted, starting from "okay, let's just check what you need for a useful browser normal people can use every day" and ended up with a few dozen things, but I never really posted it as I wasn't quite sure that was fully correct either and because I never really figured out some of the questions above.

I think most of the complexity stem not just from the word count, but rather that everything interacts with everything else. Consider the relatively new "position: sticky" in CSS. Okay, great. But it doesn't work well with flexboxes, or RTL, or negative margins, or z-index, etc. etc. [1] Adding what seems like a fairly simple feature is quite complex because it interacts with so many things. It's not hard to imagine a fresh new HTML and CSS which allows all the features the current does but does so in a much simpler and orthogonal way, which would of course break backward compatibility and every website.

[1]: In 2020 anyway; I'm not sure on the current state; here are some of the links of my post from 2020 which like most of my posts I never finished:

chrismorgan · on April 11, 2023

> I think most of the complexity stem not just from the word count, but rather that everything interacts with everything else.

And most specifically in layout and rendering. HTML, JavaScript and the parts of CSS that aren’t, y’know, doing anything, are all very straightforward, despite having the significant majority of the word count. If anything, I’d say that in web matters implementation difficulty is inversely proportional to word count, because its verbosity pretty consistently comes from precision (which makes implementation easy). Layout stuff would be much harder to define exhaustively in that fashion, nor is it done so in most places.

illiarian · on April 11, 2023

> I think most of the complexity stem not just from the word count, but rather that everything interacts with everything else.

That is definitely the main issue.

And you're completely correct on the needed/non-needed/subjective front. Many of the standards reference (in a recursive manner) a lot of other standards. A listed some here: https://news.ycombinator.com/item?id=35524018 As an outsider it's impossible to know whether TIFF spec or ISO 7-bit coded character set for information interchange are relevant, an need to be studied, or are there just because they define some minor values referenced in some more higher-level spec.

cxr · on April 11, 2023

It's not hard. Start with the WHATWG's spec, then incorporate the other specs it references using a reasonable heuristic to determine if a given item should be included or not.

If you don't think the estimate from Reckless, Infinite Scope is wildly off, then you either didn't read the methodology and do a spot-check of the dataset, or you really don't understand the scope of what gets published by W3C and how little much of it has to do with Web browsers or how many revisions of them there are.

illiarian · on April 11, 2023

It may not be possible to come up with a "reasonable heuristic": https://news.ycombinator.com/item?id=35524018

cxr · on April 11, 2023

It is possible, and that sentence is verging on nonsense. A heuristic is not by definition perfect or optimal.

illiarian · on April 11, 2023

> It is possible, and that sentence is verging on nonsense.

Define "reasonable" then, when talking about the web.

cxr · on April 11, 2023

The only bar that the heuristic has to pass here is "delivers a result that doesn't suck as bad as the analysis in Reckless, Infinite Scope". The analysis in that article is so bad, however, that your heuristic can literally be, "if you encounter an item that was also in Drew DeVault's input set, then assign an arbitrary probability 0.9 (or whatever) of whether the item should be counted", and it would still give you a more realistic result than what the article says (and that people are actually relying on in their arguments—and that you are defending) here.

Aside from that, given how many logical errors and weird counterconclusions[1] you've managed to stuff into this discussion, though (and to have been able to do so economically[2]), I'm going to go ahead and say this is my last response to you that I spend more than 10 seconds writing out.

1. e.g. <https://news.ycombinator.com/item?id=35521704#35524952>

2. wrt number of words, fittingly

EVa5I7bHFq9mnYK · on April 11, 2023

Mozilla Windows binaries weight 226MB ... does it mean encoding rate of 2 bytes per word of specifications? Pretty good packing, I'd say.

Asraelite · on April 11, 2023

I'm not surprised, it usually takes many words of English to describe very little code.

For example, changing "greater than" to "greater than or equal to" might require only a single bit of change in the machine code.

silisili · on April 12, 2023

I think the opposite. Sure, in your example that is true, but that's assuming we speak to the code line by line, in English.

But we don't, really. Example:

Write a function that keeps track of the name and weight of each person added, then prints the list, sorted by weight, lightest to heaviest.

I asked GPT this. My 140 character, unoptimized query resulted in 816 characters of c++ code.

chrismorgan · on April 12, 2023

Your example is heavily underspecified. In what form are people’s details added? How is the list printed? A spec that’s actually implementable will be a good deal longer. One that defines behaviour completely (what ordering should you use for equal weights?) will be longer still.

The HTML and ECMAScript specs that comprise most of what we’re talking about are very much closer to line-by-line, because they’re designed to be both implementable and completely specified.

silisili · on April 12, 2023

If I change my query to be more specific

Write a program that keeps track of the name and weight of each person added, sorted by weight, lightest to heaviest. The input should be a command line prompt asking for input in 3 fields - first name, last name, weight. If two people have the same weight, order them alphabetically by last name. At the end, when a blank line is entered, print the list with headings first name, last name, weight. Check the input, if it's not 3 sections or empty, print an error explaining the input format.

497 input characters, 1317 output characters.

In the case of a detailed or verbose spec, you're probably right. I'm just replying to the assertion that it generally takes many words of English to equal little code. If that were true, nobody would be using ChatGPT to scaffold.

Now, if you're going to be detailed about -how- each line should look, I'd agree that English would be more verbose than code.

chrismorgan · on April 13, 2023

Still seriously underspecified for an interoperable spec. You haven’t defined the input or output forms anywhere near precisely enough, or how to order alphabetically by last name (sorting depends on locale: e.g. is æ equivalent to ae—though that still raises stability questions—a letter after a, a letter after z, something else? I think there are languages that treat it as each of these. Or are you just rejecting anything beyond ASCII letters, which will cause different trouble?), or what to do about two people with the same weight and last name.

Web specs need to consider all of these sorts of things. That’s why they’re verbose—they’re designed to be implementable and complete.

alkonaut · on April 11, 2023

The fact there are better specs doesn’t help if a large part of the work is handling things that are outside the spec. You better show every “buggy” page similar to how the major browsers show them or the new browser will be considered defective. That’s the unfortunate reality of web tech (I wish every page with an js error or incorrectky closed tag would be a big fat error message but it isn’t). And that’s still a lot of slow guesswork I imagine.

lelanthran · on April 11, 2023

> You better show every “buggy” page similar to how the major browsers show them or the new browser will be considered defective.

Used to be true, I doubt that it is anymore.

There are too few (I could find exactly none, to be honest) sites are around these days that are unreadable when rendered strictly according to a newish (say, 2019) HTML/Javascript spec.

The proliferation of front-end frameworks means that almost no site is going out of spec, and because any site that doesn't meet a large portion of the spec is invisible to search engines, having the site be broken when sticking to the various specs is no issue.

In short:

1. With practically all large-traffic sites using a framwork, a browser that strictly sticks to the specs and the specs alone is not at a disadvantage.

2. With important on SEO, a site that is unreadable on a recent spec is not going to be found anyway by the large body of traffic.

Conclusion: a browser that sticks to the spec and the spec alone has a fighting chance.

It's the complexity and edge-cases of an exceptionally large, complicated and self-contradictory spec with thousands of edge-case when different parts of the spec are combined that's the problem.

arp242 · on April 11, 2023

All this "quirks mode" stuff is part of the specification, no? It makes it all a bit more complex than it has to be, but I do believe it's specified.

I'm not really sure if "you need to be bug-compatible" is still true; it probably was 15 years ago, but Chrome, Firefox, and WebKit tend to be pretty decent these days.

michelb · on April 11, 2023

Quitks mode is one thing, but most browsers have specific rules for specific websites, a manual process to update and handle those cases. Pretty sure chrome and safari have hundreds of these rules.

vermilingua · on April 11, 2023

Do you have any references? I'd be interested to see the list and what workarounds are needed

Devasta · on April 11, 2023

https://github.com/WebKit/WebKit/blob/main/Source/WebCore/pa...

Don't know if this is everything, but there are a bunch of specific websites mentioned in here.

arp242 · on April 11, 2023

It's not clear to me if those are due to shortcomings in WebKit, the site, or if it's to be "bug-compatible" with anything else. Either way, 1,600 lines of code doesn't seem a lot to me.

zelphirkalt · on April 11, 2023

If anything, websites have become way less clean and invalid HTML. I remember people, including myself, putting W3C validator icons on websites. Rarely do I see any these days, because of all the invalid HTML and dynamically created websites. Maybe all the tags are closed nowadays, so maybe at least that. But which elements are used inside which other elements and whether they are semantically appropriately used is another matter.

dfox · on April 11, 2023

One of the ideas behind HTML5 is that while there is some concept of validity and well-formedness, essentially any random stream bytes describes exactly one DOM tree, in some cases the resulting tree is surprising, but even then should be same across all conformant parsers (modulo scripting support).

The end result is that validation is not that much interesting anymore, because the idea was that valid (X)HTML document should parse the same accross all browsers (which it mostly did, but that did not say much about how it was actually rendered).

arp242 · on April 11, 2023

Like most people I gave up on the whole semantic pedantry a long time do. Correct header ordering, basic semantics like <nav>: sure, that's great. But "no <p> inside <dt> allowed!" just makes no real sense and is exceedingly pedantic.

The validator badges were kind of a backlash against the tag soup of the day; part of the reason for that was that everyone who knew how to program a VCR could get employed as a "webmaster" in those days, but also because the authoring tools for non-tech authors weren't as good. HN sees a lot of posts from non-tech people, often written on WordPress, Medium, or whatnot. 25 years ago it would more likely have been "tag-soup'd" by some non-tech person who just learned a bit of HTML.

chrismorgan · on April 11, 2023

Those badges regularly did not reflect reality.

Nowadays, HTML parsing is exhaustively defined in the form of a couple of state machines, so it’ll behave the same everywhere. It’s genuinely easy to implement perfectly (though it’ll still take a while because there is quite a bit of it).

nicoburns · on April 11, 2023

> The fact there are better specs doesn’t help if a large part of the work is handling things that are outside the spec

The way that web specs are handled means that better specs actually bring a lot of those things into the spec. i.e. browser implementers will define a new spec that clearly explains the quirk, and then align on the implementation. There is also a huge test suite which can be used to test conformance.

It's not perfect, but it's definitely a significantly better situation than we had.

idonotknowwhy · on April 11, 2023

Yep, Opera 12 was the most standard complaint browser when it existed, and it died

ryanjshaw · on April 11, 2023

What we all REALLY care about: they went with C++ and Qt

https://github.com/SerenityOS/serenity/tree/master/Ladybird

codetrotter · on April 11, 2023

The cross-platform version of the browser uses Qt

The version of the browser native to SerenityOS hopefully still uses the SerenityOS GUI libraries

throwaway2037 · on April 11, 2023

It looks like they ported Qt to SerenityOS. I saw a package called "qt6-serenity". Perhaps they use the SerenityOS GUI libraries underneath. Does anyone know?

tristan957 · on April 11, 2023

Ladybird is a browser based on SerenityOS technologies that uses Qt as the GUI framework. In SerenityOS, they have their own browser using the same underlying technologies, but a different in-house GUI framework.

WebKit and Blink are similar in how they have their different counterparts like QtWebEngine or WebKitGTK. The equivalent to WebKit and Blink in SerenityOS is called LibWeb.

throwaway2037 · on April 13, 2023

Hmm, I was actually asking if SerenityOS released a "fork" of Qt that re-implements class QPaint to use their native GUI API. I assume yes. QPaint is turtles all the way down to paint pixels on any platform -- MacOS, Win32, X Windows, Wayland, Android, iOS, embedded (auto), etc.

krzyk · on April 11, 2023

Exactly, adding to that: they are NOT repackaging/rebranding chrome/chromium, they are actually building a browser.

capableweb · on April 11, 2023

The "modern" distinction would be they are building a browser + browser engine, while most fancy new browsers tend to just reuse existing engines, like Edge did with Blink.

ChickeNES · on April 11, 2023

I would really like to see a version with a C API capable of being embedded, there's a lot of places where a lightweight HTML renderer would be useful, plus it would make it easier to port to other hobby kernels.

kitsunesoba · on April 11, 2023

Agree. This market is currently cornered by WebKit, which is easily the most language and UI framework agnostic engine there is. Blink might be similarly easy to embed but I’ve not seen it used that way — Blink embedding tends to be via Electron, CEF, or Qt, whereas you might run into WebKit in some random program written with any number of UI frameworks.

There used to be Gecko as an option here too, but Mozilla decided that it shouldn’t be usable outside of XULRunner and made it effectively unembeddable unless you’re willing to commit to XUL.

vbitz · on April 11, 2023

I experimented with this a while ago but didn’t get very far.

The rendering was done into a pixel buffer with inputs being passed though a few relatively simple C++ classes.

I think that a small embedded version would make for a great UI framework with minimal dependencies.

F3nd0 · on April 11, 2023

I care more about the choice of licence… and sadly enough, they went with a pushover/permissive one. This seems like the kind of project where copyleft is by far the better choice.

eastbound · on April 11, 2023

I thought unsafe language were a no-go in the context of extremely vulnerable pieces of software like the web?

gquiniou · on April 11, 2023

The are building a memorysafe language too : https://awesomekling.github.io/Memory-safety-for-SerenityOS/

and their goal is to use it to incrementally rewrite SerenityOS

TedDoesntTalk · on April 11, 2023

Firefox has tons and tons of C++ to this day.

rob74 · on April 11, 2023

Yes, but they are replacing it bit by bit - I mean, they even started Rust for exactly that purpose. So (without being a huge fan of Rust) the decision to start a "greenfield" browser project in a memory-unsafe language is questionable IMHO...

suby · on April 11, 2023

It's not clear to me that they are replacing it bit-by-bit.

https://4e6.github.io/firefox-lang-stats/

I don't have an over-time series, but if you're willing to take my memory at its word Rust's percentage has hovered at around 10% for a while now. It seems to have actually gone down recently. Combine that with efforts like Servo being wound down and their team being let go, and it makes me wonder what the future of Rust looks like in Firefox.

If anyone can shed some light on this I'd be interesting to know.

nashashmi · on April 11, 2023

I think they stopped the rebuild. They were previously building a new browser engine called servo. Some of that work made it through to Firefox gecko. And then team was gutted.

tomashubelbauer · on April 11, 2023

This chart looks like it could use some filtering for what constitutes a language used to build Firefox. It seems questionable that HTML is used to build 16 % of it. I suspect that is a result of test cases being included in the chart as it is based off of the whole repo. I checked out the repository and it doesn't have the GitHub language bar I see on other repos, so I can't click the HTML bit in it to filter down the HTML files in the repo and see if they are mostly tests or not, but it is hard to imagine they would be anything else. Maybe bits of the browser Chrome but still, that wouldn't be a whole 16 % I think.

lelanthran · on April 11, 2023

> Yes, but they are replacing it bit by bit - I mean, they even started Rust for exactly that purpose. So (without being a huge fan of Rust) the decision to start a "greenfield" browser project in a memory-unsafe language is questionable IMHO...

Maybe, but the speed with which SerenityOS, its programs and the browser has been implemented, with so few man-hours thrown at it kinda displays why C++ was chosen over Rust.

There is no comparable project in Rust that demonstrates just how quick you can go from "nothing" to Full-Fledged OS, with applications, with a browser.

Just from the Serenity project (if you've been following it), it looks like C++ is about 10x faster to write performant and safe code in than Rust.

strus · on April 11, 2023

> There is no comparable project in Rust that demonstrates just how quick you can go from "nothing" to Full-Fledged OS

https://www.redox-os.org/

lelanthran · on April 11, 2023

>> > There is no comparable project in Rust that demonstrates just how quick you can go from "nothing" to Full-Fledged OS

> https://www.redox-os.org/

Doesn't that sort of prove my point?

I dunno if you've tried both SerenityOS and Redox - I have, and SerenityOS is just more complete and usable as a daily driver than Redox[1].

Redox developed over 7 years has less functionality than SerenityOS developed over 4 years.

[1] They both have a long way to go before being completely usable as a daily driver, but Redox has a longer way to go than SerenityOS.

rob74 · on April 11, 2023

Sort of... but that could be due to a host of other factors besides which language is better: how good the core developer(s) are at community-building, how committed they themselves are to the project... hell, even the fact that one project used GitHub (which reduces the friction for developers who are already on GitHub to start working on the project), while the other one has its own GitLab might be relevant.

pjmlp · on April 11, 2023

I think it was more like "they were" up to the point the Rust team got layed off.

monooso · on April 11, 2023

But would they choose C++ if they were _starting_ today?

pjmlp · on April 11, 2023

Note that SerenityOS started in 2018, they decided to use C++ for it, and even the newly created language for safer userspace (Jakt) generates C++ as target.

So maybe, Jakt will get used as well.

tzekid · on April 11, 2023

I do mostly Python on my dayjob, but for low-level side-projects I've gotta say C++ with the C++17 or C++20 standard it's way faster to iterate with than say Rust or even something like Zig.

For me iteration speed's a big selling point that (plus the fact that's easier to find contributors) might also be important for projects like these.

chrismorgan · on April 11, 2023

I’m puzzled by your comment. I am expert in Rust for almost a decade, but only mildly conversant in C++, and have no interest in actively learning more C++.

Rust seems to me far easier to learn and get going in due in major part to its incontrovertibly superior standard tooling.

I can’t see any place for any meaningful difference in iteration speed between the two, save that you may well have to iterate more in C++ due to memory safety bugs the compiler doesn’t catch.

As for finding contributors, I get the impression that Rust is considerably more accessible, and thus will increasingly find contributors more easily, as people that just love programming will actively choose to learn Rust far more often than C++. (For the current state of affairs, I think it’ll depend on what sort of contributor you’re looking for, in skill, industry, paidness, &c. Some segments will certainly go one way, and others certainly the other.)

cxr · on April 11, 2023

Iteration speed with both Rust and C++ is abysmal. Builds take for fucking ever on large projects and it's just slightly less bad for small-to-medium and medium-sized projects.

With Rust, though, it's as if someone looked at C++ compilation times (not to mention resource requirements) and said, "I think we can find a way to make it worse."

actionfromafar · on April 11, 2023

Crystal says "hold me beer" :-D

But I still really like Crystal.

ClawsOnPaws · on April 11, 2023

I have a hard time deciding where in this thread to drop this link, but maybe here is a good spot. Andreas has a video about this topic, and I believe it's this one: https://www.youtube.com/watch?v=vAZvTFoSIFU

Koffiepoeder · on April 11, 2023

Published on April 1st 2022 ;-)

ClawsOnPaws · on April 11, 2023

yeah, it's probably the wrong video. Sorry. :( I know it's there though. This question came up and I know there was an answer for it.

GaggiX · on April 11, 2023

The biggest problem IMO with using C++ when I tried the browser is that it crash, it crash a lot and I would imagine they are segmentation fault.

jeroenhd · on April 11, 2023

Last time I tried the browser, most crashes were the typical "not implemented yet" code paths. Some feature wasn't built yet, so necessary flags weren't set, so the program caught the invalid state and died.

I don't think I've ever seen a crash in either Serenity or Ladybird that I could attribute directly to memory management. For volunteer C++ projects, their memory management seems to have been done excellently. Using modern C++ features and things like error return types instead of null seems to be a key part in making the browser this good.

It's also worth mentioning that as far as I know Ladybird doesn't implement a JIT engine, using bytecode to execute Javascript instead. That should also make life significantly easier for memory management.

It's still a young browser and I'm sure there are some nasty memory corruption bugs lurking in the depths, but I haven't seen those yet.

bogwog · on April 11, 2023

Can you recommend a language that doesn't crash?

jeroenhd · on April 11, 2023

There are some languages that can be formally verified, and have themselves been formally verified.

Formal verification is a complete pain in the ass to do and there's a reason it's mostly done only in the most critical of systems, but if a program passes 100% formal validation, you're as close to crash free as you can possibly be.

I believe Ada and some other lesser used language sport well supported formal verification methods. You won't be able to use C/C++/Java/Rust it you're going for 100% formal verification though. There are attempts to bring the concepts to more commonly used languages (Frama-C, for example) but in my experience they're stuck in PhD-ware hell, great for writing papers but terrible for writing actual software.

KsassPeuk · on April 19, 2023

Frama-C is used in production to meet normative requirements for critical software at least at Airbus (DO-178C), THALES (CC EAL6/7), EDF (ISO 60880). I think that they use actual software in production.

still_grokking · on April 11, 2023

For example C/C++, with the help of something like: https://www.absint.com/astree/index.htm

But there are actually other languages that are better suited to such methods: More or less everything from the functional space is quite well applicable to those.

GaggiX · on April 11, 2023

Any language that enforces memory safety is unlikely to crash because of a segmentation fault.

kylieee · on April 11, 2023

They will instead crash because of an uncaught exception or a panic handler - end result is the same.

GaggiX · on April 11, 2023

By removing segmentation fault crash you have removed a large portion of unwanted crashes, the end result is very different.

kylieee · on April 11, 2023

But the crashes doesn't get removed, they get transformed into error types or exceptions and you still crash when you don't handle them.

GaggiX · on April 11, 2023

This is only true if the language does not employ static analysis.

umanwizard · on April 11, 2023

A no-go? Every major browser is written in an unsafe language, except for parts of Firefox.

nashashmi · on April 11, 2023

This was a missed opportunity to build in rust. Servo was the previous engine that got abandoned midway. They should have continued that.

smolder · on April 11, 2023

This browser isn't that ambitious. They're not trying to revolutionize anything, just to make a working one.

nashashmi · on April 11, 2023

And they will optimize it later.

cubefox · on April 11, 2023

This post doesn't give reasons to doubt that building a new state of the art browser is basically impossible now. It might well be possible to build a new browser that kind of works on many popular websites, and that would be surprising enough. But the amount of work needed to build something comparative to the rendering engines of Chrome, Firefox, or Safari, something really usable, would probably take decades rather than years. If it is possible to catch up at all. (I remember once seeing a graph which compared software projects by lines of code, and browsers were only topped by a few things like major operating systems.)

dehrmann · on April 11, 2023

A new browser does have some benefit of hindsight, but if I had to look at the Firefox source for it, I'm not even sure where I'd begin.

> browsers were only topped by a few things like major operating systems

And they're even closer once you subtract out things like device drivers.

yodsanklai · on April 11, 2023

Is all the complexity of a browser in the rendering engine? can a browser be split into several components so that they can be rewritten concurrently?

kitsunesoba · on April 11, 2023

The vast majority of the complexity is in the engine. One can whip up reasonable chrome (browser UI) in whatever UI framework one prefers in a few days tops. While there are slightly more involved parts like writing the bookmarks and history systems, those are pretty run of the mill tasks that can be completed in a relatively short period of time.

cubefox · on April 11, 2023

Exactly.

riffraff · on April 11, 2023

Mozilla's servo seemed to be done in this fashion, so that some components were later brought to gecko/Firefox, so presumably you can do it.

nicoburns · on April 11, 2023

Yeah, I think servo had the right idea. The main problem with servo's components is that they're severely underdocumented, which has made it harder than it should be for some components (like webrender) to become widely adopted (some of the other ones like html5ever and cssparser are widely used).

mort96 · on April 11, 2023

I suppose it depends on what the goal was. If the goal was to end up with widely reusable web browser components, the lack of documentation might've been a problem; but if the goal was to improve Firefox, it seems to have been a smashing success.

nicoburns · on April 11, 2023

It has been good at improving Firefox, but if you look at the wider picture then you see that Firefox has been falling in usage and struggling to keep up with webkit/blink. And IMO a large part of that is because core parts of Firefox Gecko and Spidermonkey are much less widely used (by as many apps/companies) than equivalents like Webkit/Blink/V8/JSC, and this because they are not easily embeddedable and their codebases are harder to work with.

From this perspective, not focussing on documentation and making components usable externally is pretty short sighted.

epilys · on April 11, 2023

Chrome wasn't built in a day.

BurningPenguin · on April 11, 2023

Chrome used an already existing render engine and improved on it.

oxguy3 · on April 11, 2023

More specifically, Chrome used Safari's rendering engine, and Safari used Konqueror's rendering engine, because even in 2001, starting a browser engine from scratch seemed like too much work.

rob74 · on April 11, 2023

I would say that in 2001, starting a browser engine from scratch was more work than today, because (as other commenters have noted) since then the specifications have become more robust and the "tag soup" sites not following the specs have become fewer.

Devasta · on April 11, 2023

https://meiert.com/en/blog/valid-html-2021/

Most sites still don't have valid HTML.

EDIT: my link is old. 98% of the top 100 sites had invalid HTML in 2021, in 2022 we've managed to hit 100%, great job everyone!

https://meiert.com/en/blog/valid-html-2022/

dolmen · on April 11, 2023

Well, the W3C validator also has issues which do not help.

For example, Wikipedia doesn't validate [0] because its CSS has "aspect-ratio: 1" which looks ok to me regarding the spec [1].

[0]: https://jigsaw.w3.org/css-validator/validator?profile=css3sv...

[1]: https://w3c.github.io/csswg-drafts/css-values-4/#ratio-value

chrismorgan · on April 11, 2023

“Invalid HTML” is completely irrelevant. HTML parsing is defined exhaustively; “parse errors” are purely “you probably made a mistake, but I’ll keep going” indications, and all browsers will do the same thing.

rob74 · on April 11, 2023

And that's why we can't have nice things, er, why we will never have valid HTML on a significant percentage of websites: browsers are historically very lenient with HTML errors (because otherwise they wouldn't be able to show 90% of all sites), and no one uses HTML validators to check if their HTML actually conforms to the spec. It's a chicken and egg problem really: the browsers can't be more strict because there are so many broken sites, and the sites won't be fixed because the browsers aren't strict enough.

arp242 · on April 11, 2023

I did a quick check, and most of the errors it reports are "unknown attribute" or "element such-and-such not allowed here". Those "errors" would be allowed anyway for forward compatibility, and aren't really a big deal.

IMO the validator's definition of "invalid HTML" is just too strict; it should only count parse errors and completely non-sensible things. And the specification is also too strict at times; on my own website I have "Element style not allowed as child of element div in this context." This is because on some pages it adds a few rules that apply only to that page and this is easiest with Jekyll. I suppose I could hack around things to "properly" insert it in the head, but this works for all browsers and has for decades and why shouldn't it, so why bother?

If the specification doesn't match reality, then maybe the specification should change...

zelphirkalt · on April 11, 2023

This does not surprise me at all, with all the "must be a web app", still treating HTML mostly as a string in many web frameworks and semantically inappropriately using tags. It is exactly as I thought, the ratio of invalid HTML has become even worse. Probably most web devs these days do not even check their websites for HTML validity, because achieving it with the frameworks they chose is hard or impossible.

smolder · on April 11, 2023

I think you severely underestimate "frameworks" if you think generating valid HTML is somehow harder with them than without.

aylmao · on April 11, 2023

Moreover, tools are better too. Even if one is still using only vim on the terminal, plugins work better, screens are bigger, code compiles faster, and the internet has better resources for everything from programming and communicating with your team, to just finding music that helps you stay productive, for example.

dolmen · on April 11, 2023

The Internet also has more social networks to keep you away from being productive.

illiarian · on April 11, 2023

In 2001 the entirety of HTML+CSS spec was probably less than just some of CSS modules like CSS Color.

Today the complexity lies not in the robustness of the specs, but in the sheer number of of them, and their many interactions. I mean, just distance units... There are over forty of them

still_grokking · on April 11, 2023

The problem was solved by standardizing tag-soup. Now just any tag-soup will get displayed the same according to the new specs.

panic · on April 11, 2023

Today's HTML spec defines precisely what will happen for any input—feed modern browsers all the tag soup you'd like!

shp0ngle · on April 11, 2023

And Firefox is going all the way back to Netscape Navigator...

danjoredd · on April 12, 2023

In the end, it all leads back to Konqueror. Pretty sad to see that piece of software go, considering its historical significance

robertlagrant · on April 11, 2023

Nero Burning ROM did it.

1vuio0pswjnm7 · on April 11, 2023

Tha fall of Chrome.

d--b · on April 11, 2023

Now it's all spaghetti

Puts · on April 11, 2023

But it's gonna burn in one?

sircastor · on April 11, 2023

[flagged]

codetrotter · on April 11, 2023

[flagged]

triyambakam · on April 11, 2023

[flagged]

tigerlily · on April 11, 2023

[flagged]

novalis78 · on April 11, 2023

[flagged]

AdammadA · on April 11, 2023

Guys, I don't mean to sound miserable, but please don't turn this into Reddit comments with puns and jokes. Lets keep the signal-to-noise ratio optimal.

nicoburns · on April 11, 2023

The specs really are drastically better than they used to be. Compare the modern specification for CSS Table Layout (https://www.w3.org/TR/css-tables-3/) with the older CSS2 one (https://www.w3.org/TR/CSS2/tables.html). The older one doesn't even attempt to define the "automatic layout algorithm" at all!

doodlesdev · on April 11, 2023

What is this supposed to mean in the website?

   > Not Ready For Implementation

   > This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

   > Before attempting to implement this spec, please contact the CSSWG at [email protected].

missblit · on April 11, 2023

It means that the web is built on top of a bunch of working draft specifications.

If you work in the web space you quickly learn to shrug, double check that browsers do actually implement this version, and proceed.

nicoburns · on April 11, 2023

It means that the spec is a draft and that it hasn't been finished yet (there may still be bits missing or wrong). But it is clearly already so much better than the old version.

tester756 · on April 11, 2023

I'm finding it weird that unlike other non-trivial projects like OSes or compilers,

people often discourage building web browser engine because it is "hard" or something like that

like... how is it different from building a compiler?

You gotta build HTML parser, CSS parser, figure out a fancy structure to represent those concepts and modify at fly.

Also there's difference between making it work and making state of the art.

That persons says that there's shitton of RFCs - yea sure, but you don't aim to support everything from the beginning.

Let's start with HTML + CSS, then build basic js interpreter

hnlmorg · on April 11, 2023

A few reasons it’s harder to build a browser:

- the web is numerous specifications: HTTP, HTML, CSS, JS, SVG. At worst, a regular compiler needs to worry about macros and the language syntax

- each of those specifications has numerous versions which, in some cases, can be significantly different from other versions of the same language or protocol. A language compiler generally only focuses on one version of that language

- A browser needs to support broken websites. A compiler only needs to fail gracefully

- A browsers output is graphical, which is much harder to unit test

In short, you’re dealing with a harder problem across a broader number of specifications. I would liken writing a browser more closely to writing a new graphical OS than writing a compiler.

(“Browser” here means “browser + engine et al” and not just a reskin of Chromium).

missblit · on April 11, 2023

A browser rendering engine's output isn't purely graphical, and most things can be tested through other means such as by reading console.log output, looking at the DOM, or looking at computed CSS styles and/or bounding-box information.

In fact there's a good reason to keep graphical tests to a minimum: web specs do not dictate things down to the pixel level, so pixels can shift around from version to version, requiring the occasional golden data rebase.

Fun aside: Chrome's test suite contains a font named ahem.ttf where (almost) every character is an identical black rectangle. This allows tests to include text without relying too much on the details of a particular font.

hnlmorg · on April 11, 2023

The examples you’ve given of the non-graphical elements are describing the easy problems and I was describing the hard problems.

Our two posts aren’t mutually exclusive.

noirscape · on April 11, 2023

The main reason it's difficult is because the output criteria seem properly defined but they actually aren't at all.

Yeah it's "just" building some parsers and figuring out live updates, but you have to keep in mind that this is ~the internet~. People have been uploading broken, against spec, webpages since forever. Coding a web browser as a serious project (so not as a flight of fancy) borders on the impossible mostly because of that.

The main sites people test against/use aren't the "simple" CSS/JS/HTML sites from the past. Few people will care for a browser whose main job is to be able to render a neocities website. People want their popular sites working - Discord, Facebook, reddit, twitter. All of those are big JS apps.

The real bugbear here is JS though, HTML and CSS are complex but workable. JS is an ever-moving target as spec implementers (mostly Chrome) dump more and more of the jobs a browser was meant to do as the user agent into JS[0]. (And that's without delving into how widevine became part of the spec, which means it's legally impossible to make a fully spec compliant browser.)

Polyfills can offer a lot of fallback/lenience, but polyfills are a moving target too - older browsers get deprecated, polyfills get removed for performance/optimization reasons, so your baseline spec for functional JS becomes ever-increasing unless you somehow get the people making popular JS libraries to accept that your browser project is important enough to keep the necessary polyfills around for.

[0]: Presumably so that Google can take away the User part from the browsers job as the User Agent, but typically covered up as a poorly defined "privacy problem".

cxr · on April 11, 2023

This comment gets some pretty important fundamentals wrong.

> The real bugbear here is JS though, HTML and CSS are complex but workable. JS is an ever-moving target

What you characterize as "JS" is, in reality, more HTML and CSS than JS. JS is a language. The fact that all the behavioral details of the HTML and CSS objects and related host objects have bindings available to JS programs does not make those things "JS"...

Doing a new JS engine from scratch is an order of magnitude easier than doing a browser engine. It is directly analogous to the eminently tractable "building a compiler" problem that the other commenter mentioned.

noirscape · on April 11, 2023

Fair. I meant JS here as in "fully DOM compatible, as-used-in-your-browser JS". JS engines themselves aren't that hard to make (I think there's about 9 or 10 actively maintained ones?), but to make one that's usable in situations that aren't things like node or as a sub-language in a different project... that's far more difficult.