Hacker News new | past | comments | ask | show | jobs | submit login

i'm very skeptical about the benefits of a binary JavaScript AST. The claim is that a binary AST would save on JS parsing costs. however, JS parse time is not just tokenization. For many large apps, the bottleneck in parsing is instead in actually validating that the JS code is well-formed and does not contain early errors. The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval… This would be a major semantic change to the language that should be decoupled from anything related to a binary format. So IMO proposal conflates tokenization with changing early error semantics. I’m skeptical the former has any benefits and the later should be considered on its own terms.

Also, there’s immense value in text formats over binary formats in general, especially for open, extendable web standards. Text formats are more easily extendable as the language evolves because they typically have some amount of redundancy built in. The W3C outlines the value here (https://www.w3.org/People/Bos/DesignGuide/implementability.h...). JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.

Finally, although WebAssembly is a different beast and a different language, it provides an escape hatch for large apps (e.g. Facebook) to go to extreme lengths in the name of speed. We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.

[0]: https://github.com/syg/ecmascript-binary-ast/#-2-early-error...




Early benchmarks seem to support the claim that we can save a lot on JS parsing costs.

We are currently working on a more advanced prototype on which we will be able to accurately measure the performance impact, so we should have more hard data soon.


It seems like one big benefit of the binary format will be the ability to skip sections until they're needed, so the compilation can be done lazily.

But isn't it possible to get most of that benefit from the text format already? Is it really very expensive to scan through 10-20MB of text looking for block delimiters? You have to check for string escapes and the like, but it still doesn't seem very complicated.


Well, for one thing, a binary format’s inherent “obfuscatedness” actually works in its favor here. If Binary AST is adopted, I’d expect that in practice, essentially all files in that format will be generated by a tool specifically designed to work with Binary AST, that will never output an invalid file unless there’s a bug in the tool. From there, the file may still be vulnerable to random corruption at various points in the transit process, but a simple checksum in the header should catch almost all corruption. Thus, most developers should never have to worry about encountering lazy errors.

By contrast, JS source files are frequently manipulated by hand, or with generic text processing tools that don’t understand JS syntax. In most respects, the ability to do that is a benefit of text formats - but it means that syntax errors can show up in browsers in practice, so the unpredictability and mysteriousness of lazy errors might be a bigger issue.

I suppose there could just be a little declaration at the beginning of the source file that means “I was made by a compiler/minifier, I promise I don’t have any syntax errors”…

In any case, parsing binary will still be faster, even if you add laziness to text parsing.


a simple checksum in the header should catch almost all corruption

For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

It's true that the binary format could be more compact and a bit faster to parse. I just feel that the size difference isn't going to be that big of a deal after gzipping, and the parse time shouldn't be such a big deal. (Although JS engine creators say parse time is a problem, so it must be harder than I realise!)


> For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

The point I was trying to make isn't that a binary format wouldn't have to be validated, but that the unpredictability of lazy validation wouldn't harm developer UX. It's not a problem if malicious people get bad UX :)

Anyway, I think you're underestimating the complexity of identifying block delimiters while tolerating comments, string literals, regex literals, etc. I'm not sure it's all that much easier than doing a full parse, especially given the need to differentiate between regex literals and division...


I was figuring you could just parse string escapes and match brackets to identify all the block scopes very cheaply.

Regex literals seem like the main tricky bit. You're right, you definitely need a real expression parser to distinguish between "a / b" and "/regex/". That still doesn't seem very expensive though (as long as you're not actually building an AST structure, just scanning through the tokens).

Automatic semicolon insertion also looks fiddly, but I don't think it affects bracket nesting at all (unlike regexes where you could have an orphaned bracket inside the string).

Overall, digging into this, it definitely strikes me that JS's syntax is just as awkward and fiddly as its semantics. Not really surprising I guess!


Early error behavior is proposed to be deferred (i.e. made lazy), not skipped. Additionally, it is one of many things that require frontends to look at every character of the source.

I contend that the text format for JS is no way easy to implement or extend, though I can only offer my personal experience as an engine hacker.


If early error is deferred then it's no longer early... that's all I meant by skipped. It still is a semantic change that's unrelated to a binary AST.


Indeed it's a semantic change. Are you saying you'd like that change to be proposed separately? That can't be done for the text format for the obvious compat reasons. It also has very little value on its own, as it is only one of many things that prevents actually skipping inner functions during parsing.


Our goal is not to complicate Javascript, but to improve parse times. Fundamentally that boils down to one issue: engines spend too much time chewing on every byte they load. The proposal then is to design a syntax that allows two things:

1. Allow the parser to skip looking at parts of code entirely.

2. Speed up parsing of the bits that DO need to be parsed and executed.

We want to turn "syntax parsing" into a no-op, and make "full parsing" faster than syntax parsing currently is - and our prototype has basically accomplished both on limited examples.

> JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.

As an implementor, I have to strongly disagree with this claim. The JS grammar is quite complex compared to a encoded pre-order tree traversal. It's littered with tons of productions and ambiguities. It's also impossible to do one-pass codegeneration with the current syntax.

An encoding of a pre-order tree traversal is not even context-free (it can be implemented on top of a deterministic PDA). It literally falls into a simpler class of parsing problems.

> The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval…

This really overstates the issue. One can equally rephrase that statement as: if you are shipping JS files without syntax errors, then the behaviour is exactly identical.

That serves to bring to focus the real user-impact of this: developers who are shipping syntactically incorrect javascript to their users will have their pages fail slightly differently than their pages are failing currently.

Furthermore, the toolchain will simply prevent JS with syntax errors from being converted to BinaryJS, because the syntactic conversion is only specified for correct syntax - not incorrect syntax.

The only way you get a "syntax" error in BinaryJS is if your file gets corrupted after generation by the toolchain. But that failure scenario exists just the same for plaintext JS: a post-build corruption can silently change a variable name and raise a runtime exception.

So when you trace the failure paths, you realize that there's really no new failure surface area being introduced. BinaryJS can get corrupted in the exactly the same way with the same outcomes as plaintext JS can get corrupted right now.

Nothing to worry about.

> We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.

We need to speed up Javascript more, and parsing is one of the longest standing problems, and it's time to fix it so we can be fast at it.

Wasm is not going to make regular JS go away. Codebases in JS are also going to grow. As they grow, the parsing and load-time problem will become more severe. It's our onus to address it for our users.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: