Hacker News new | past | comments | ask | show | jobs | submit login
Proposal for a Binary Encoding of the JavaScript AST (github.com/syg)
45 points by bobajeff on Aug 2, 2017 | hide | past | favorite | 90 comments



Jesus christ. The solution to "our webpage has 7.1 MegaBYTES of javascript" isn't "lets use a more compact representation of the code", it's STOP FUCKING ADDING MORE JS SHIT.

------

It's basically:

Users: "My house is on fire!"

These devs: "Ok, we're here to help, we cleaned your windows. You can see the fire from outside so much better now!"

------

Maybe they should solve the problem of their website being a bloated javascript blob-monster, rather then just trying to fit it into a smaller box.


I feel like this is an unfair criticism unless you know what the page is doing.

People are building entire vector editing programs on the web. How can you be fine with Photoshop being several hundred megs but not with this person having less than 10 megs of code to achieve something similar?

The web isn't just Hacker News and CNN. People write real programs on it.

Obviously shipping less code is Good(TM). But isn't an across the board improvement also good?


>People are building entire vector editing programs on the web.

This is the exception rather than the rule. There is no reason why it should take more than ten seconds to load a static news article with text.


In which you—and the original starter of this comment thread—completely missed the point. The linked page clearly uses complex applications as examples. Whatever anyone's opinion of them are, they are quite complex.

This proposal is aimed at web applications, not a simple website with image/text content, and minimal javascript.


Yet Google Sheets and Gmail are applications, Facebook and LinkedIn are websites which might as well be fully static and are twice the size so his point is kinda correct.

Mind you, Gmail also has features like Chat as do Facebook and LinkedIn.


Facebook.com does a lot more than show a newsfeed, profiles, and notifications.

  - It contains a fully-fledged messenger application that supports payments, videoconferencing, sharing rich media etc
  - The newsfeed supports interactive 360-degree video, a live video player, mini-games in the newsfeed, and lots of other rich/interactive media
  - It's a gaming platform for 3rd party games
  - It's a discussion forum, groups management system, event planning UI
  - Photo sharing and editing platform, as well as live video streaming tool
  - A platform for businesses to have an online presence (Pages)
  - A peer-to-peer online marketplace (called "Marketplace")
And a dozen other things I can't think of right now. You might say "but I don't use all of those things". That's another tricky part, the fact that every user has a different "configuration" and different types of content in their newsfeed at any given moment, requiring them to be served different sets of JavaScript code.


Most of these features are unnecessary until the user tries to play a video, a game, opens a dialog etc.


Right, and that's what happens today, the JS for the secondary functionality is loaded on demand.

Here's what I have in my FB homepage during a random load:

  - Search bar for searching people/groups/posts/pages
  - News ticker
  - Friend requests, Notifications
  - Sidebar ads
  - A rich text editor for sharing my status
  - A newsfeed story with a special "memory - 3years ago" feature
  - Comments & commenting UI under newsfeed stories
  - Suggestions for "People You May know"
  - A video auto-playing a clip from a friend, with capability to auto-select between tracks of different video quality based on bandwidth (including bandwidth estimator code)
  - And probably a dozen different A/B experiments that I'm a part of
It takes a lot of code just to render all these UI elements. If I interact with any of them, additional code is loaded (you can see this in the Network monitor).

This homescreen UI is as rich as any desktop application and requires no less code to render. The problem being addressed in this proposal is that a native version of this app would start a lot faster than the web version. And that's because a browser will parse all the code files loaded on startup in an app (inefficiently, by necessity), but a native app will only read the code for the code paths that are actually executed.

Basically, O(code executed) is a lot better than O(all files containing code that is executed). And this proposal features a more parser-friendly encoding and change to the parsing semantics.


I think the point was, the premise of the OP's original "solution" was a sweeping generalization that doesn't apply to non-trivial web _apps_ that have a large amount of inherent complexity, which is then necessarily reflected in the size of their codebases. This proposal aims to provide a real solution to tackle problem.


That should take less than one second, not ten.


As you probably know, building a sophisticated application can easily become quite complex. This is the case even if the application is designed to look non-sophisticated, as Twitter or Facebook, for instance. Now, when we are discussing desktop applications or mobile applications, people typically do not care all that much about the codesize, because it is essentially invisible to the user. For instance, who cares that the Steam binary (just the binary, without resources or dynamic dependencies) weighs 3.1Mb or that the Gimp binary (again, without resources or dynamic dependencies) weighs ~9Mb or that the Telegram binary weighs ~11Mb?

On the web, of course, since the application is downloaded dynamically, rather than being installed, people don't expect to wait and they start noticing. In some cases, web developers can fine tune their JS to make sure that less of it is shipped. Very quickly, however, this becomes horribly expensive, because it's a complicated problem with any language and JS is really not suited to static analysis and static code optimizations.

Of course, another part of the problem is that the delivery format of JS (i.e. the source code) is optimized for small files. For larger files, we can do much better. This is the objective of this project.


> or that the Telegram binary weighs ~11Mb?

The memory footprint of a native and JS app is very different. While 11Mb Windows app won't require more than 11Mb of memory for the code, JS 11Mb app might require some 50-100 Mb of memory only for keeping the code, different caches and internal structures.


I'm more than happy to discuss memory footprints, but since this proposal is about parsing speed and file size, that might be a little off-topic :)


This is quite mean.

There is nothing wrong with doing both this and cutting down on bloat.


There’s nothing wrong with it, but it won’t happen.

If no one had ever cared about performance in browser engines, we wouldn’t have the problem of massive pages that we have now. Things like Facebook would have to be more efficient about how they do things.

By making loading faster, you reduce the incentive for them to reduce their own bloat.

Of course, on the other hand, if browsers were still slow, there are various things that people can do now which they wouldn’t have bothered with before. A vector graphics engine or sound engine, for example, would have been impossible with the browser functionality of now but the performance of ten years ago, because they would have been unbearably slow.

You win some, you lose some. Mostly you just end up with an unhappy balance that keeps no one really happy.


I agree it won’t happen but that doesn’t make the original comment any better.


We're not stopping until facebook.com is the same 240 MB download that Facebook for iOS is.

...but service workers so users will love it!!! ...right?


I don't like the part that explains the motivation for implementing binary encoding:

> A brief survey from July 2017 of the uncompressed JS payload sizes for some popular web applications on desktop and mobile:

> Facebook 7.1 MB

I think Facebook should just remove the code they don't need rather than invent new compression methods. There is no way a social network with a chat needs so much JS code.


Just 7.1MB of source? That's smaller than some desktop chat applications used to be, even when they basically just use the default windows styles.


You forget that Facebook code is executed inside a browser (for example it doesn't have to render UI because HTML/CSS engine does that) while "desktop chat applications" can contain a Webkit engine, Node.JS and a lot of bloated uncompiled libraries.

7Mb of Javascript is too much. 200-300 Kb should be enough to view photos and post comments.


To be fair, pretty much any platform (operating system) provides rich APIs and basic functionality like built-in UI components. So I don't think there's a lot more pre-existing functionality in a browser vs Windows or Android.


I think you are wrong. Windows API compared to HTML APIs are very low-level. For example, there is no automatic layout and you must specify exact size and position of every GUI element like button or input field.


Some apps use a GUI framework, and don't combine it with their app into a single file, which makes for a better comparison (e.g. Paltalk does this).


This is kind of a noob question, but do you have any idea if Javascript actually gets fully parsed on the client side every time it's used, or if the AST gets cached? (Let's assume the "client side" is a newish version of Chrome.)


Chrome now caches compiled code: https://v8project.blogspot.ca/2015/07/code-caching.html

In addition to this, browsers do lazy parsing. They do not fully parse the bodies of functions until the functions are executed.


Note that "lazy parsing" means "do most of the parsing, just don't store the result". That's a consequence of the specifications of JavaScript that require all syntax errors to be thrown as early as possible.

This is something that would change with the Binary AST, hence allowing much lazier parsing.


Firefox now caches bytecode, too.

https://groups.google.com/forum/#!topic/mozilla.dev.platform...

Note, however, that caching bytecode solves a different problem. Caching bytecode is great for the second+ load of a website. Making initial parsing faster is great for the first load of a website since its latest update. If you consider that Facebook updates its front-end once per hour, for instance, that's a lot for some (not all) websites.

Also, the size improvements may come in useful for applications written in, say, Cordova/PhoneGap.


I meant "several times per day".


Chrome and Firefox both cache hot code in its compiled (bytecode) form. This proposal addresses cold code loads. Many web apps update frequently (more than once per day) making caching much less effective, and cold code much more common.


It's also possible to dynamically, lazily load some parts of the JS code. Surely not all the code is needed all the time. For example, you may not need to load the code for the settings page until the user actually goes to edit the settings.


Yes, Facebook already lazily loads code for secondary functionality. More details in this comment https://news.ycombinator.com/item?id=14912871


Please also look at my comment that shows the opposite: Facebook is loading a lot of unnecessary modules: https://news.ycombinator.com/item?id=14916177


Yup, replied above. Thanks for digging into this more :)


Still, in my experience, Facebook chats loads several Megabytes (that's gzipped size) of JavaScript upon startup. In this case, lazy loading gets into the way of performance, as Facebook chat ends up lazily loading 102 files (if I recall correctly).


Facebook already only ships code that is needed to render the current page. It even goes further and it streams the code in phases so that the browser renders the UI incrementally and the page becomes interactive as soon as possible. It then pulls in code dynamically for any secondary feature only if the user interacts with that feature.

Some of the design is documented here https://www.facebook.com/notes/facebook-engineering/bigpipe-...

Facebook has done a TON of optimization. The fundamental issue is that Facebook is not a webpage, it's a full application. And the Web as an application platform lags behind native platforms with respect to startup performance.

I explained why Facebook.com is really an app in this comment https://news.ycombinator.com/item?id=14912393

Now compare the functionality of the Facebook Android app vs Facebook desktop webpage (nearly identical) and look at their respective installed sizes (~7MB vs ~180MB).

This proposal will benefit both web apps and web documents. And more importantly, it will allow people to build richer, more sophisticated applications on the web without sweating over extra kilobytes in their JS code size.


Most of the code doesn't have to be loaded immediately. For example, you write in your comment, that Facebook can play 360 degrees videos, but the player doesn't have to be loaded until the user tries to play such a video.

I decided to look at Facebook code more closely. It is modular and contains several thousand of small modules. For example, on a news feed page the browser has loaded 66 JS files with total size of 5.7Mb containing 3062 modules [1].

But it is clear that many of those modules are not necessary to display a page. For example, a module with a name "XP2PPaymentRequestDeclineController" that is probably related to payments is not necessary. Or a module with a name "MobileSmsActivationController". Obviously Facebook preloads a lot of code for different dialogs that might be unused most of the times the page is loaded.

Of course I understand that it is very difficult to optimize code when a large team is contantly writing new code and everybody has strict deadlines so there is no time for optimizations, especially if they require serious refactoring.

[1] https://gist.github.com/codedokode/cb506cee367bdb9e1071bc186...


Facebook.com today loads functionality dynamically. Open the Network panel, interact with a secondary feature, and you will see it load code on-demand.

With respect to your example of unnecessary modules, sometimes the dependency trees between modules are non-obvious. But more to the point, the code served to a user is NOT personalized for each individual user and to their specific newsfeed and UI contents. This is actually a performance optimization. Facebook looks at which modules are most commonly required across most users based on their recent activity, and then bundles those common modules into large packages and pushes them aggressively to the browser. This actually results in a very large loading-time win, but ends up overserving some % of extra modules that are not needed by a specific user.


Honest question: isn’t this problem already addressed by WebAssembly’s binary encoding?


Here, "the problem" is that parsing JavaScript is very slow (and the second problem is that sending these files can be pretty slow, too). WebAssembly cannot really ship JavaScript code. Well, I guess it could if someone were to write a JavaScript to native compiler and ship it as WebAssembly, but then you would end up with more problems:

- yet another JavaScript target, with possible incompatibilities, security issues, etc;

- you still need to ship the binaries, which will most likely be much larger (and much slower to parse) than your initial source code;

- "interesting" interactions between Wasm-compiled JS and browser-compiled JS, as well as the DOM, garbage-collection, ...


WebAssembly is a similar idea, but it only encodes the ASM.js subset of javascript, not the full language.


Do you think that Google Sheets should cut back on their 6 MB of code?


I love the idea of binary ASTs as the things browsers compile instead of source, but I don't think making browsers even more complex is worth it. These new binary files would also be yet another type of asset to serve alongside your .js files, with implications for debugging and source maps. If the reward is bloated SPAs load 10% faster, as long as they ship both JS and binary JS, forever, I would wonder if browsers should parse the binary format or just keep optimizing their text parsers more.


This seems pretty easy though.

When you parse, you parse to the same JS AST.

When you show it in the browser, you show the text form of the JS AST. Source maps like normal.

It's a reasonably lightweight proposal, though IDK if it will be compelling enough.


It might be a good idea. But scripts inside HTML not compiled because every user should have a right to see what gets executed of his device. Or this is just my fantasy and not design choice by early days engineers? I feel really conflicted about wasm and this.

Maybe now we can just put index.exe and forget about openness of web already


I don't know of anyone who actually audits the JS they run. I know people (like Richard Stallman) who don't run JS at all, and people (like my roommate) who use NoScript with exceptions for websites they trust—which is a trust by website, not by code; they'll trust any JS that origin produces in the future. I suppose there are people running GNU LibreJS and running any JS that claims to be freely licensed, but again without looking at the code to see if it's heavily obfuscated. And then there's the rest of us, the vast majority, who just run all the JS we se.

If we already had people who audit JS before running it, preserving auditability would be worth doing, but keeping the ability to do something that (I think) literally nobody has done in over 20 years of JavaScript isn't worth trading off actual technical improvements for.

Fortunately we have something very different and (IMO) more practical, that makes this very different from an "index.exe" model: we have isolation between websites enforced at the browser level. aim.exe can read my saved quicken.exe documents. hangouts.google.com cannot interact with my bankofamerica.com account; it can't even know that I have one.

I think I have a right to see source to all the code that runs on my device with full privilege. I also think I have a right to limit the powers of code to which I don't have source, including both limiting what sorts of things it has access to and how much CPU, memory, etc. it's allowed to consume. I don't really know that insisting on the right to see source to code that runs confined is actually going to measurably improve my computing freedom.


> aim.exe can read my saved quicken.exe documents. hangouts.google.com cannot interact with my bankofamerica.com account; it can't even know that I have one.

Although these days we have sandboxed native software as well. iOS, Android, Mac App Store and I believe Windows has something like this now too?


Anyway, with this format, Devtools will be able to display the source code. It's optimized for sending + parsing, but it can be decompiled without loss of anything meaningful.

There is even a proposal to support keeping comments in the file.


This is an encoding of the JS AST, so it has a 1-to-1 mapping to JS (modulo whitespace, but the JS would probably be minified anyway). It would be trivial to build a tool to convert this back into JS, and browser devtools will probably do that automatically.


Technically, the tool to convert back to JS is already implemented. For the moment, it loses only comments and layouts and even comments might remain (see issue #11 on github).


It seems that you (like a great number of other people) assume that an ordinary person without special skills cannot comprehend what some code does once it is compiled. But that assumption is fortunately wrong. It is easier than you think to learn to read disassembly of machine code. It would be much easier to read higher level bytecode such as Android VM's smali.

So don't fear compilation. What is truly worrying is any attempt to restrict access to code that is run by a computer. Even a highly skilled reverse engineer cannot do anything if there is no way to access code.


s/ordinary person without special skills/someone who could understand the source code of the compiled artifact/ and I'm with you.


Nowadays, most JS is uglified/minified on build anyway. I believe that times when after-build JS code was readable for the end user are long gone.


Actually, this format is much more readable than minifed JS, as it maintains variable names, etc (there is a proposal to also support comments without meaningful loss of performance).

Of course, this does not prevent developers from minifying/uglifying the JS code first, if they so decide.


it just feels like putting lipstick on a pig, use binary to solve the problem of bloated utf8 files.


utf8 isn’t “bloated.” A utf8 file with just ASCII characters (every character you’ll see in JS, with the exception of foreign language strings) is identical to the ASCII encoding. You may be thinking of UTF16, which does use two bytes for every character.


Thoughts. This looks to make Javascript even more "lisp-ish" than it already was, by simplifying syntax. Albeit a binary lisp.

Second thought - regular zip compression already minimises Javascript, doesn't it?

Which leads to my third thought - isn't the most important problem they are trying to solve, the slow parsing speed of Javascript text? They solve that by introducing unambiguous operators and other things. This could be done without resorting to binary encodings. They could introduce a plain text encoding, where (for instance) only UTF8 is allowed, with unambiguous operators.

THEN they could put a binary encoding on top of that, if necessary. Or just zip it.

Instead they jumped straight at the binary representation. I'm not saying it's wrong to do so. But maybe unnecessary?

Please let me know if my 5 minute understanding of what they are doing is wrong...

EDIT: spelling.


Slow parsing speed is due to several things:

- encoding issues;

- ambiguous stuff (for instance, `/` can be the start of a comment, the start of a regexp or a division, `()` or `{}` can have several meanings, etc.);

- odd behavior (variables can be used before they are declared, double-declaring variables is sometimes valid but not always, etc.);

- plenty of information that is necessary for execution is only available after you need it (sometimes much later, e.g. determining free variables, determining whether `eval` can affect local variables, etc).

Also, this syntax makes it impossible to perform concurrent parsing (at least not without extremely costly non-concurrent pre-parsing) and makes it basically impossible to perform lazy parsing (at least not without this same extremely costly pre-parsing).

If I understand correctly, your suggestion would have been to:

1. provide a "better" developer-facing text syntax;

2. if necessary, move to binary parsing.

Your suggestion makes sense but I believe that it underestimates seriously the ability to:

a. come up with an alternative, easier to parse, text syntax;

b. make sure that your text syntax has the same semantics as the original syntax;

c. get everybody to agree on that text syntax;

d. maintain this text syntax through successive versions of JavaScript, without losing its desirable properties;

e. have every browser maintain two text parsers.

Also, note that b. most likely means standardizing upon an AST, which is roughly half of the difficulty of the current proposal.

Finally, unless I am missing something, your suggestion does not improve concurrent or lazy parsing. One of the core benchmarks behind the current proposal is that lazy parsing can be made extremely faster than non-lazy parsing, as soon as you the expensive pre-parsing phase has been abolished.

Does this answer your questions?


I think the web could actually benefit from this and most developers these days are used to sending their client code through a compiler already so it would make a minimal difference in tooling.

Personally, though, webpack is pretty amazing. When I first heard of it I very much disliked it but it combined transpiling, with minification, with code optimization like tree shaking (getting rid of code from the compiled result that is unused).


You don't have to parse every single module at page load. You can defer, or only load code the first time it's used. This 10-15% of time saved is only on first load. If you have a web "app" it's not like the user will reload it over and over again like the old school server rendered apps, which still works fine btw, they are even faster now with faster computers and better network latency. So if this is not the problem of classic web apps, nor the problem of modern web apps, who's problem will it solve !?

Make cache-friendly modules, only abstract code that can be reused, keep the rest of the code in-line.


I wrote https://news.ycombinator.com/item?id=14908903 in answer to a different question. Does it answer your points, by any chance?


JavaScript is a fun language because it does not require a build step. Developing JavaScript is easy and you want to make it harder just to shave off a few milliseconds !?

Show me some stats and graphs on how long it actually takes to parse JS to AST.

I care a lot for performance, if you want to make me happy, make text rendering in the canvas faster. And if you could also make eval faster I would be very happy.


Most professional development I have seen in the past few years already uses a build step, either to pack dependencies, to run linting or to polyfill language features, so this Binary AST will not really change that. Indeed, I hope that Babel, Webpack & co will eventually ship with an option `--binary-ast` and that the only change for developers will be to add this option to their toolchain.

However, if you wish to write JavaScript without build step, you will of course be free to continue doing this. I don't think that anybody has ever thought of making the Binary AST compulsory. It's just a build target that you can use for decreasing file size and parse time.

For stats, see the proposal. We'll add more stats once the advanced prototype is complete.

> I care a lot for performance, if you want to make me happy, make text rendering in the canvas faster. And if you could also make eval faster I would be very happy.

I'm sure both are in progress, but entirely orthogonal to this proposal.


> For parsing desktop facebook.com, 10-15% of client-side CPU time is spent parsing JavaScript. The prototype we implemented reduced time to build the AST by 70-90%.

Whatever the win from their prototype, it should never be a reason to change core Javascript like .toString(). A facebook representative in this little proposal club doesn't feel good. My first thought was: please optimise your own codebase and don't break other people's code for your own profit.


> don't break other people's code for your own profit.

How is adding a new kind of function (created from the AST encoding instead of source) with it's own definition of .toString() going to break other people's code?

If you don't use binary ASTs you will never even see the new behavior. Unless you are the one mucking around with other people's code, in which case

  >> Function.prototype.toString.toString()
     "function toString() {
        [native code]
     }"
Function.protoype.toString is already broken for functions without available source.


Out of curiosity: do you actually use `Function.prototype.toString()`? I haven't seen any use of this method in years. Plus the few uses that I have seen were libraries that attempted to rewrite other libraries and broke at pretty much each update of their dependencies.

There may be legitimate uses of this method in production code, but I can't think of any from the top of my mind. If you can think of one, don't hesitate to file it as an issue in the linked tracker.


`Function.prototype.toString()` is the basis for an alternate, ES5-era multiline string syntax:

    var str = (function(){/*
        STRING
        GOES
        HERE
    */}).toString().slice(14, -3);


Prior to ES2015, the spec's definition of Function.prototype.toString() was pretty vague. The behavior was implementation-dependent, although I don't know the differences per browsers, since I've never used this feature seriously. Here's the text from ES5.1 [0]:

> An implementation-dependent representation of the function is returned. This representation has the syntax of a FunctionDeclaration. Note in particular that the use and placement of white space, line terminators, and semicolons within the representation String is implementation-dependent.

[0] http://www.ecma-international.org/ecma-262/5.1/#sec-15.3.4.2


That's... an interesting use. I'm pretty sure it violates the actual specifications of JavaScript, breaks in presence of a minifier/uglifier and slows down your script, but it's a cool hack :)


Breaking Function.prototype.toString() might actually be a good idea because now the engine has to keep the source code in memory and most of the times it is absolutely unnecessary. It would be better to free this memory.


This suggestion also requires a JS engine to have an AST compatible with the one that is described. So the browser developers will have less freedom in a choice of an internal AST representation.


For what it's worth, I'm currently implementing the Binary AST parser (using the current candidate AST) in SpiderMonkey (which uses its own AST, of course) and that doesn't cause any issue so far.

The AST used in the file is a convenient abstraction but doesn't need to match the AST used in-memory.


It seems like this is a very small gain. After the initial fetch everything should be cached locally. If it's not you did it wrong.

Am I missing something? How is this better than normal web caching?


Yes and no.

You are right, after the initial fetch, everything should be cached locally. However, caching traditionally does not affect parsing speed. While size gains are a nice side-benefit of this proposal, the main benefit is parsing speed.

Now, both Chromium and Firefox have introduced smarter caching that caches post-parsing data for websites that you have already visit (or visit often, I don't remember the heuristics). This is very useful for all the sites that you visit regularly and that do not update their JS code between your visits. However, for all the sites that do update their JS code between your visits, including Facebook (which updates several times per day), Google Docs, etc. and every random website that you read because of a link on Hacker News but will never revisit again, there is a potentially pretty large size benefit.

Does this answer your question?


It could be called "Java".


Send your JavaScript as an image, then unpack it using WebGL.


Isn't "binary AST" called "WebAssembly"?

Oh. Right. I forgot. "Web" assembly was specifically designed to exclude Javascript.

So yes, go ahead. Create another incompatible binary format


People are downvoting my comment for some reason.

The whole premise of "binary javascript AST" is "compile JavaScript into binary representation because bundle sizes are just too big and takes too much time to parse".

Isn't WebAssembly designed to kinda solve that and many other problems presented in the Readme among other things? Instead of parsing JS (or whatever language) you just download binary (which has been compiled, optimised etc.).

So, instead of designing webassembly to support the most important language on the web, the committee (as always, there's an effing committee to committee the hell out of things) designs webassembly with just one single language in mind: C++. Because reasons.

Hence, this ridiculous proposal: oh, let's create a binary representation of Javascript AST because reasons. Because "oh, we can't compile to webassembly because webassembly hasn't even been designed to support the main language on the web and many other major languages".

Instead, we're just going for the lame excuse of "no vendor would agree to ship bytecode." (why would they agree to ship binary ASTs?) and "there is no guarantee that it would be any faster to transmit/parse/start" (where's the guarantee with binary AST?).


Well, if you look at the design goals of WebAssembly [1], you'll see very different high-level goals. Sure, there is some intersection, but not nearly as much as you seem to believe.

Now, there are very good reasons to not ship bytecode (e.g. because once you standardize the bytecode on which your VM runs, you're dead in the water and your VM stops evolving). The binary AST was carefully designed to avoid this issue.

Indeed, as written in the proposal, there is no guarantee that sending bytecode would be any faster to transmit/parse/start. Moreover, experiments suggest that it wouldn't. On the other hand, if you have read the proposal, you have seen that experiments with binary AST suggest that it is faster to transmit/parse/start.

As for committees, well, that's how the web works. It is certainly not perfect, but it beats single-vendor-decides-everything.

[1] https://github.com/WebAssembly/design/blob/master/HighLevelG...


> Well, if you look at the design goals of WebAssembly [1], you'll see very different high-level goals. Sure, there is some intersection, but not nearly as much as you seem to believe.

Yes. Because it's not tied into a single programming language

> Now, there are very good reasons to not ship bytecode (e.g. because once you standardize the bytecode on which your VM runs, you're dead in the water and your VM stops evolving). The binary AST was carefully designed to avoid this issue.

Oh. Right. Because binary AST is not something that will be standardized on, and once you're set on a particular version of the AST, you're not dead in the water?

Oh. Wait. Let me quote the proposal:

--- quote ---

The grammar provides the set of all possible node kinds and their ordered properties, which we expect to be monotonically growing. However, not all vendors will support all nodes at any given time.

... parsing a correctly formatted file fails if a parser encounters a feature that the engine does not implement

--- end quote ---

Same problems

> Indeed, as written in the proposal, there is no guarantee that sending bytecode would be any faster to transmit/parse/start. Moreover, experiments suggest that it wouldn't.

Let me quote the WebAssemply READMEs

--- quote ---

Wasm bytecode is designed to be encoded in a size- and load-time-efficient binary format. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

The kind of binary format being considered for WebAssembly can be natively decoded much faster than JavaScript can be parsed (experiments show more than 20× faster). On mobile, large compiled codes can easily take 20–40 seconds just to parse, so native decoding (especially when combined with other techniques like streaming for better-than-gzip compression) is critical to providing a good cold-load user experience.

--- end quote ---


Mmmhhh... I believe that I am starting to understand your point. If I understand correctly, you regret that WebAssembly does not offer a set of opcodes that would let it trivially (or at least simply) compile JavaScript, right?

If so, let's reset the discussion, because we were actually talking past each other.

Yes, such opcodes might essentially solve the issues that the binary AST is attempting to solve, albeit probably at the cost of losing the source code of JavaScript.

I have not been part of the debates on Wasm, but I believe that I can extrapolate some of the reasons why this wasn't done:

1. Coming up with a decent target for statically compiled languages whose compilation is basically well-known is much easier than coming up (and maintaining) a decent target for a dynamic compiled language whose specifications are amended roughly once per year.

2. Each browser vendor has its own JS-specific bytecode format. Transpiling a neutral bytecode to a vendor-specific bytecode isn't particularly easy.

3. If a vendor decides to expose their own bytecode format and let third-parties ship to this format, they suddenly get a low-level compatibility burden that will be extremely damaging to their own work on the JS VM.

4. If the JS opcodes are high-level, suddenly, Ecma needs to maintain two different specifications for the semantics of the same language: specifications by interpretation and specifications by compilation. On the other hand, if the JS opcodes are low-level, there is a pretty large burden for developers on maintaining a JS-to-bytecode compiler and making sure that it has the exact same semantics as JavaScript. In the latter case, there is also a pretty good chance (and some early experiments) that the compiled files will be much larger and much slower to parse.

In comparison with these issues, coming up with a simple Binary AST format is rather simple, hence has much higher chances of achieving success and consensus. Additionally, we have encouraging numbers that indicate that Binary AST can speed up things a lot. Finally, the Binary AST has several interesting side-benefits, including the fact that it maintains source code readability.


> Mmmhhh... I believe that I am starting to understand your point. If I understand correctly, you regret that WebAssembly does not offer a set of opcodes that would let it trivially (or at least simply) compile JavaScript, right?

Yes. Because SURPRISE it's called webassembly

> Each browser vendor has its own JS-specific bytecode format. Transpiling a neutral bytecode to a vendor-specific bytecode isn't particularly easy.

Newsflash: webassembly is:

1. based on asm.js

2. supported (eventually) by all browser vendors

3. is specifically targeting the web

4. is already shown to be a significant improvement (see, for example, https://blog.figma.com/webassembly-cut-figmas-load-time-by-3...)

> If a vendor decides to expose their own bytecode format

Why would the want to expose their own bytecode format? The entire point of webassembly was to be a crossplatform format

> If the JS opcodes are high-level, suddenly, Ecma needs to maintain two different specifications for the semantics of the same language: specifications by interpretation and specifications by compilation.

Why? No one maintains two different specifications for C++ just because it compiles to JavaScript (via EcmaScripten) and to WebAssembly.

Meanwhile, with binary AST, Ecma will indeed need to maintain two different specifications

> In comparison with these issues, coming up with a simple Binary AST format is rather simple, hence has much higher chances of achieving success and consensus.

Issues that you pulled out of nowhere, frankly

> Additionally, we have encouraging numbers that indicate that Binary AST can speed up things a lot.

We have encouraging numbers that WebAssembly speeds up things a lot. Yet, you insist "oh, we don't know, it might, or it might not".

> Finally, the Binary AST has several interesting side-benefits, including the fact that it maintains source code readability.

Quoting WebAssembly FAQs:

--- quote ---

WebAssembly is designed to be pretty-printed in a textual format for debugging, testing, experimenting, optimizing, learning, teaching, and writing programs by hand. The textual format will be used when viewing the source of wasm modules on the web.

--- end quote ---

I strongly suggest you

1. read up on WebAssembly

2. Possibly apply efforts to bring JavaScript to WebAssembly


I'm afraid that we are on vastly different wavelengths. If you feel that you can improve WebAssembly, then by all means, please do so.

I am convinced that I can do good by making Binary AST a reality, so I will focus my energy on that.

Thanks for the conversation.


So, basically, all your "arguments" for binary AST resolve to:

- binary AST will not have versioning problems unlike WASM (this is false)

- binary AST is better than ASM, because as soon as you standardize on bytecode, "your VM will not evolve" (this is false)

- binary AST can be used to view/debug source code, while WASM can't (this is false)

- binary AST is faster to load and parse than WASM (neither false nor true, because there are no comparisons or experiments)

- supporting WASm means supporting two versions of ECMA standard unlike binary AST (which is false)

and so on and so on

However, Javascript is already a dumpster fire, so this effort will neither improve nor worsen the situation


WebAssembly is well suited for statically typed codebases written in C++ etc. You can't compile JavaScript to WebAssembly. Yes, you could do Web development in a statically typed language, but would you want to?


1. One of my complaints was that yes, WebAssembly for some reason decided to target C++ first.

2. From the FAQs:

Beyond the MVP, another high-level goal is to improve support for languages other than C/C++. This includes allowing WebAssembly code to allocate and access garbage-collected (JavaScript, DOM, Web API) objects

3. News flash: some people use statically typed languages for web development: TypeScript, Elm, Purescript


> One of my complaints was that yes, WebAssembly for some reason decided to target C++ first.

It's much easier to support C++ in WebAssembly. C++ and other statically typed languages can be compiled ahead of time to low-level instructions that manipulate memory or registers in a virtual CPU.

It's much more difficult to compile dynamic languages. Consider a JavaScript statement like:

let result = a + b;

If this were a statically typed language, the compiler would know "a" and "b" are integers and can compile it into a single ADD.INT assembly instruction.

In a dynamically typed language, that "+" symbol could be an integer addition, or a floating-point addition, or a string concatenation depending on the types of "a" and "b". So what should the JS-to-WASM compiler generate? It has to generate different code to handle all the different data types, including throwing an exception for invalid types.

There would be a few problems with WASM code generated by such a compiler:

1) the generated code with all this extra checking is not going to be performant, 2) the generated code would be much larger, which would hurt transfer & parsing times, 3) the compiler would essentially be outputting the JavaScript interpreter in WASM by adding all these runtime guards for types.

> Beyond the MVP, another high-level goal is to improve support for languages other than C/C++. This includes allowing WebAssembly code to allocate and access garbage-collected (JavaScript, DOM, Web API) objects

Adding GC and DOM interop will help WASM adoption, but you'll still have the issues I described above if you try to compile JS codebases to WASM.

> statically typed languages for web development

Yes, people can use statically typed languages for web dev, and if they compile them to WASM after it has GC and fast DOM interop support, they will get performance wins vs transforming their codebases to plain JS.

But there are productivity advantages from using dynamically-typed languages during development and there are existing, very large web app codebases written in JavaScript which cannot be typed.


> There would be a few problems with WASM code generated by such a compiler: > 1) the generated code with all this extra checking is not going to be performant, 2) the generated code would be much larger, which would hurt transfer & parsing times, 3) the compiler would essentially be outputting the JavaScript interpreter in WASM by adding all these runtime guards for types.

And, of course, you have full evidence supporting these statements


I don't, you're welcome to prove me wrong if you want to whip up a basic prototype. I'm vdjeric on github.

My goal is to make sophisticated web apps faster, I'm not married to any particular approach.


> you're welcome to prove me wrong

Ahahah what?

You are the one claiming this. The burden of proof is on you


I would say that the burden of proof is on me to prove that Binary AST is a significant real-world performance win and that it will not cause undue burden on JS engine implementors.

I don't think the "burden of proof" philosophy requires me to disprove every other possible approach, right?

I explained my reasoning for my statements in the comment itself. If you believe WASM could address this use case better, and are so inclined to build a toy proof-of-concept JS-to-WASM compiler, I'd be very interested in seeing it.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: