I think this fragment catches the spirit of this piece: *A good rule of thumb is...

Klathmon · on Sept 18, 2019

I find this really interesting, because at some point the absolute performance benefits of `JSON.parse` is overshadowed by the fact that it blocks the main thread.

I worked on an app a while ago which would have to parse 50mb+ JSON objects on mobile devices. In some cases (especially on mid-range and low-end devices) it would hang the main thread for a couple seconds!

So I ended up using a library called oboe.js [1] to incrementally parse the massive JSON blobs putting liberal `setTimeout`'s between each step to avoid hanging the main thread for more than about 200ms at a time.

This meant that it would often take 5x longer to fully parse the JSON blob than just using `JSON.parse`, but it was a much nicer UX as the UI would never hang or freeze during that process (at least perceptively), and the user wasn't waiting on that parsing to happen to use the app, there was still more user-input I needed from them at that time. So even though it would often take 15+ seconds to parse now, the user was often spending 30+ seconds inputting more information, and now the UI would be fluid the whole time.

nojvek · on Sept 19, 2019

If you really need to work with large data files 1mb+. Json is a terrible format. You should look into flat buffers. It’s like having indexed json where there is no parsing cost. You can have millions of rows and nested objects and it will only read the bytes it needs.

It is length prefix encoded format so it’s pretty safe to work in a streaming manner too.

smileypete · on Sept 19, 2019

Good article on how Facebook used them for their mobile app:

https://code.fb.com/android/improving-facebook-s-performance...

nostrebored · on Sept 18, 2019

Why not just use promises?

side note: legit question, I don't do web/app dev

Klathmon · on Sept 18, 2019

Because JSON.parse blocks the thread it's in, and JS is single threaded [1].

So even if you put it behind a promise, when that promise actually runs, it will block the thread.

In essence, using promises (or callbacks or timeouts or anything else like that) allows you to delay the thread-blocking, but once the code hits `JSON.parse`, no other javascript will run until it completes. And since no other javascript will run, the UI is entirely unresponsive during that time as well.

[1] Technically there are web-workers, and I looked into them to try and solve this problem. Unfortunately any complex-objects that get sent to or from a worker need to be serialized (no pass-by-reference is allowed except for a very small subset of "C style" arrays called TypedArrays). So while you could technically send the string to a worker and have the worker call `JSON.parse` on it to get an object, when you go to pass that object back the javascript engine will need to do an "implicit" `JSON.stringify` in the worker, then a `JSON.parse` in the main thread. Making it entirely useless for my usecase.

But continuing with that same thought process, I very nearly went for an architecture that used a web-worker, did the `JSON.parse` in the worker, then exposed methods that could be called from the main thread to get small amounts of data out of the worker as needed. Something like `worker.getProperty('foo.bar.baz')` which would only take the parsing hit for very small subsets of the data at a time. But ultimately the oboe.js solution was simpler and faster at runtime.

nojvek · on Sept 19, 2019

Another trick most people don’t realize is that not only is the fetch api is asynchronous but response.json() does the conversion in a background thread and is non UI blocking.

If you have a large json object. You can use the fetch api to work with it. If you need to cache it, use the cache storage api. Unlike localStorage which will freeze the UI, cache storage wont.

It’s slightly slower since it needs to talk to another thread but who cares as long as the UI is responsive to do other things.

Klathmon · on Sept 19, 2019

This is a common misconception, but response.json() still blocks the main thread.

It looks like it doesn't, but the same exact symptoms will happen even while awaiting the fetch json().

dmix · on Sept 19, 2019

I'm guessing with Oboe.js you solved this by capturing a stream(?) of JSON but only parsing relevant chunks as they appear and match the selector? Or do you simply load the larger chunks at once (either by a request or embedding JSON into the template server side) instead of streaming?

http://oboejs.com/examples#demarshalling-json-to-an-oop-mode...

I could see the value in this for sure. I currently have a problem of loading a ton of JS for some users who have thousands of objects embedded in the view with Rails using toJSON() in a <script>. It’s creating far too much weight on the frontend. I’ve been considering fetching it via a simple REST request instead.

nostrebored · on Sept 18, 2019

Thank you for the excellent explanation!

I think of js entirely from a node.js perspective where I conceptualize it as an async task. Is this also wrong?

Klathmon · on Sept 18, 2019

Node suffers from the same issues, but it's generally not as noticable in most cases. A similar situation in node would cause the server to not be able to respond to any other requests during the `JSON.parse` execution. But in the Node world, you have more options for how to get around those problems (like load balancing requests among several node processes).

But both server-side and client-side JS use the same system, the event loop. It's basically a message-queue of events that get stacked up, and the JS engine will one at a time grab the oldest event in that queue and process it to completion. Anything "async" will just throw a new event into that queue of events to be processed. The secret sauce is that any IO is done "outside" the JS execution, so other events can be processed while the IO is waiting to complete.

Take a look at this link, or search up the JS event-loop if you want to get a better explanation. It's deceptively simple.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Even...

lossolo · on Sept 18, 2019

> Is this also wrong?

Yes, node.js javascript runtime is based on V8, the same that runs in Chrome. Javascript is single threaded so anything that is not I/O bound will block the main thread. If you don't want to block the thread becasue you have long running calculation/parsing task, then you can use worker threads[1]. This will run your task in separate thread and not block the main one.

[1] https://nodejs.org/dist/latest-v12.x/docs/api/worker_threads...

Klathmon · on Sept 18, 2019

and not to beat a dead horse, but worker threads again wouldn't work in this exact situation even in Nodejs. They suffer from the same problems that web-workers do, meaning they use a structured copy algorithm to send data between workers (with the exception of TypedArrays), and therefore would hang the "main thread" just as long as if you did the `JSON.parse` directly in it.

It's a really annoying problem, and I'm actually really happy to see that many others have the exact same thoughts I had at the time, and that I wasn't just missing something obvious!

diek · on Sept 19, 2019

Fun detail: node internally will use thread pools to do CPU-intensive tasks that would normally block the main thread.

For example: https://github.com/nodejs/node/blob/master/src/node_crypto.c...

I generally use that as an example when explaining to people why Node isn't a great fit for a lot of workloads. They have to use these features internally, but you as the user with a CPU-intensive job don't have access to those features.

ptx · on Sept 18, 2019

Maybe the worker could parse the JSON to build an index and then send over just the index. The main thread could then use the index to access small substrings of the original giant JSON string, parse those and cache the result?

tipalink · on Sept 18, 2019

what if you used multiple async ajax requests to load different parts of the UI in place of loading all 50MB at once? could that be what the OP meant by "promiseS"?

RussianCow · on Sept 18, 2019

Promises are a way to deal with async code. Parsing JSON is synchronous and CPU-bound, so promises offer no benefit. And since web pages are single-threaded[0], there isn't really any way you can parse JSON in the background and wait on the result.

[0]: There is now the Web Workers API which does allow you to run code in the background. I've never used it, but I have heard that it has a pretty high overhead since you have to communicate with it through message passing, so it's possible you wouldn't actually gain anything by using it to parse a large JSON object.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...

doomslice · on Sept 18, 2019

Promises still run on "the main thread" so a CPU intensive task in a promise is still going to block things. You could use a promise if you delegated your CPU task to another process or some C code that did actual threading.

I believe you'd use Workers (WebWorkers?) https://developer.mozilla.org/en-US/docs/Web/API/Workers to actually do it off the main thread entirely inside JS.

earthboundkid · on Sept 18, 2019

Promises still run in the main thread.

You could try to use a web worker, but then you run into the problem that they don't have shared memory, so you need to pass data back some other way.

earthboundkid · on Sept 19, 2019

IndexedDB runs on web workers, so that might be a good way to push the work off the main thread but still share the result.

Rapzid · on Sept 19, 2019

JSON.parse is not interruptible. All the answers about promises and single threading are interesting but that's the crux of the issue.

Kuinox · on Sept 18, 2019

Because JS is single threaded. If a task is too big you need to split it or the UI can't be updated until the task is done.

clinta · on Sept 19, 2019

Is that relevant when comparing parsing json with parsing literal objects? I don't know much about JavaScript engines but I'd expect that parsing literal objects in code is also blocks the main thread.

tracker1 · on Sept 18, 2019

Would probably break that up similar to how you did in that case as well. Though may use multiple server request (chunks) and/or use a websocket for the data feed.

What was the memory overhead for the application?

Klathmon · on Sept 18, 2019

I don't remember details about memory stuff, it was a few years ago now, but I was pleasantly surprised to see that it wasn't nearly as bad as I first assumed it would be.

And I did originally plan on using something like a websocket, but turns out with some minor changes on the server side we could start streaming data while it was still being gathered, and oboe.js is actually able to start parsing data even while it's still downloading from a normal XHR request, and is designed to be as efficient as possible (so it throws away string data as soon as it's not needed any more).

So there weren't really any additional benefits to be had from using websockets and breaking it up into multiple distinct requests would probably have been slower!

(I just realized I forgot to add a link to oboe.js! But I highly recommend it. It seems it's just gotten better since the last time i've used it)

[1] http://oboejs.com

penagwin · on Sept 18, 2019

I'm asking purely out of curiosity - what was the content of such a large JSON object?

Klathmon · on Sept 18, 2019

It was a carton scanning app, so basically a massive array of objects (carton data) which were needed so the app could function and route cartons and validate deliveries entirely offline. Due to some unfortunate limitations from our clients and some edge cases, we couldn't filter down the data on the server ahead of time. So we ended up having to keep that massive amount of data on the device, and at the end of the day 95% of it would be unused, but we wouldn't know which 95% until the device was already offline.

It was a system where the goalposts moved many times during the development. If I were to do it again, I wouldn't use JSON, but after having the goals change a few times and then having the original server-side components get co opted to work on other projects, it was hard to justify the time that would be spent switching to a different, more appropriate wire format.

zaroth · on Sept 18, 2019

I kept reading that as cartoons and I was just so confused for a second...

sbr464 · on Sept 18, 2019

You can also pretty easily use a web worker now, they work well. Here's [1] an example with React hooks.

Example fibonacci worker code that doesn't block the UI, even at larger calculations

  const fib = n => (n < 2 ? n : fib(n - 1) + fib(n - 2))
  
  onmessage = msg => {
    console.log('fibonacci worker onmessage', msg)
    postMessage({ num: msg.data, result: fib(msg.data) })
  }

[1] https://github.com/bharathnayak03/react-webworker-hook

Klathmon · on Sept 18, 2019

Web workers won't work in this case because they need to serialize all data going into and out of them (with the exception of TypedArrays).

So passing a string to a worker and having it JSON.parse it works great. But when you go to pass that object back to the main thread, it implicitly does a JSON.stringify and a JSON.parse back on the main thread (technically it's called a "Structured Copy", but it's mostly the same thing), putting you in the exact same situation.

sbr464 · on Sept 18, 2019

Good to know, thanks for clarifying.

Klathmon · on Sept 18, 2019

And thanks to you for helping show me that I wasn't the only one to try that!

This whole thread has been really nice to read, because I beat my head against a wall for a long time before I finally found a solution, and I'm glad to read that I wasn't the only one to think this was a lot more deceptively hard than I thought at first thought (or second, or third...)

bestest · on Sept 19, 2019

If this really NEEDS to be a client-side-only solution, I still believe a worker is the only way to go. Only, in this case, it needs to behave like an API. So your worker not only parses the JSON, but also responds to post messages with only the data that is requested from it.

Why? Because a large JSON structure is most probably just a large JSON structure, but you most probably don't need it as a whole. You may need a total count of items, you may need a paginated set of items, or only a certain item or a set of fields of items — well, an API.

kllrnohj · on Sept 18, 2019

> it implicitly does a JSON.stringify and a JSON.parse back on the main thread (technically it's called a "Structured Copy", but it's mostly the same thing)

Except funnily enough JSON.stringify + JSON.parse is usually recommendation as it's either comparable or faster than the structured copy the engine itself does :/

Web workers are depressingly bad...

stu_k · on Sept 18, 2019

You might be interested in a tool I wrote to serve .har files called server-replay: https://github.com/Stuk/server-replay

It also allows you to overlay local files, so you can change code while reusing server responses.