i'm very skeptical about the benefits of a binary JavaScript AST. The claim is t...

Yoric · on Aug 18, 2017

Early benchmarks seem to support the claim that we can save a lot on JS parsing costs.

We are currently working on a more advanced prototype on which we will be able to accurately measure the performance impact, so we should have more hard data soon.

iainmerrick · on Aug 18, 2017

It seems like one big benefit of the binary format will be the ability to skip sections until they're needed, so the compilation can be done lazily.

But isn't it possible to get most of that benefit from the text format already? Is it really very expensive to scan through 10-20MB of text looking for block delimiters? You have to check for string escapes and the like, but it still doesn't seem very complicated.

comex · on Aug 18, 2017

Well, for one thing, a binary format’s inherent “obfuscatedness” actually works in its favor here. If Binary AST is adopted, I’d expect that in practice, essentially all files in that format will be generated by a tool specifically designed to work with Binary AST, that will never output an invalid file unless there’s a bug in the tool. From there, the file may still be vulnerable to random corruption at various points in the transit process, but a simple checksum in the header should catch almost all corruption. Thus, most developers should never have to worry about encountering lazy errors.

By contrast, JS source files are frequently manipulated by hand, or with generic text processing tools that don’t understand JS syntax. In most respects, the ability to do that is a benefit of text formats - but it means that syntax errors can show up in browsers in practice, so the unpredictability and mysteriousness of lazy errors might be a bigger issue.

I suppose there could just be a little declaration at the beginning of the source file that means “I was made by a compiler/minifier, I promise I don’t have any syntax errors”…

In any case, parsing binary will still be faster, even if you add laziness to text parsing.

iainmerrick · on Aug 19, 2017

a simple checksum in the header should catch almost all corruption

For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

It's true that the binary format could be more compact and a bit faster to parse. I just feel that the size difference isn't going to be that big of a deal after gzipping, and the parse time shouldn't be such a big deal. (Although JS engine creators say parse time is a problem, so it must be harder than I realise!)

comex · on Aug 19, 2017

> For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.

The point I was trying to make isn't that a binary format wouldn't have to be validated, but that the unpredictability of lazy validation wouldn't harm developer UX. It's not a problem if malicious people get bad UX :)

Anyway, I think you're underestimating the complexity of identifying block delimiters while tolerating comments, string literals, regex literals, etc. I'm not sure it's all that much easier than doing a full parse, especially given the need to differentiate between regex literals and division...

iainmerrick · on Aug 22, 2017

I was figuring you could just parse string escapes and match brackets to identify all the block scopes very cheaply.

Regex literals seem like the main tricky bit. You're right, you definitely need a real expression parser to distinguish between "a / b" and "/regex/". That still doesn't seem very expensive though (as long as you're not actually building an AST structure, just scanning through the tokens).

Automatic semicolon insertion also looks fiddly, but I don't think it affects bracket nesting at all (unlike regexes where you could have an orphaned bracket inside the string).

Overall, digging into this, it definitely strikes me that JS's syntax is just as awkward and fiddly as its semantics. Not really surprising I guess!

syg · on Aug 18, 2017

Early error behavior is proposed to be deferred (i.e. made lazy), not skipped. Additionally, it is one of many things that require frontends to look at every character of the source.

I contend that the text format for JS is no way easy to implement or extend, though I can only offer my personal experience as an engine hacker.

s3th · on Aug 18, 2017

If early error is deferred then it's no longer early... that's all I meant by skipped. It still is a semantic change that's unrelated to a binary AST.

syg · on Aug 18, 2017

Indeed it's a semantic change. Are you saying you'd like that change to be proposed separately? That can't be done for the text format for the obvious compat reasons. It also has very little value on its own, as it is only one of many things that prevents actually skipping inner functions during parsing.

kannanvijayan · on Aug 18, 2017

Our goal is not to complicate Javascript, but to improve parse times. Fundamentally that boils down to one issue: engines spend too much time chewing on every byte they load. The proposal then is to design a syntax that allows two things:

1. Allow the parser to skip looking at parts of code entirely.

2. Speed up parsing of the bits that DO need to be parsed and executed.

We want to turn "syntax parsing" into a no-op, and make "full parsing" faster than syntax parsing currently is - and our prototype has basically accomplished both on limited examples.

> JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.

As an implementor, I have to strongly disagree with this claim. The JS grammar is quite complex compared to a encoded pre-order tree traversal. It's littered with tons of productions and ambiguities. It's also impossible to do one-pass codegeneration with the current syntax.

An encoding of a pre-order tree traversal is not even context-free (it can be implemented on top of a deterministic PDA). It literally falls into a simpler class of parsing problems.

> The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval…

This really overstates the issue. One can equally rephrase that statement as: if you are shipping JS files without syntax errors, then the behaviour is exactly identical.

That serves to bring to focus the real user-impact of this: developers who are shipping syntactically incorrect javascript to their users will have their pages fail slightly differently than their pages are failing currently.

Furthermore, the toolchain will simply prevent JS with syntax errors from being converted to BinaryJS, because the syntactic conversion is only specified for correct syntax - not incorrect syntax.

The only way you get a "syntax" error in BinaryJS is if your file gets corrupted after generation by the toolchain. But that failure scenario exists just the same for plaintext JS: a post-build corruption can silently change a variable name and raise a runtime exception.

So when you trace the failure paths, you realize that there's really no new failure surface area being introduced. BinaryJS can get corrupted in the exactly the same way with the same outcomes as plaintext JS can get corrupted right now.

Nothing to worry about.

> We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.

We need to speed up Javascript more, and parsing is one of the longest standing problems, and it's time to fix it so we can be fast at it.

Wasm is not going to make regular JS go away. Codebases in JS are also going to grow. As they grow, the parsing and load-time problem will become more severe. It's our onus to address it for our users.