Hacker News new | past | comments | ask | show | jobs | submit login

I think this fragment catches the spirit of this piece:

A good rule of thumb is to apply this technique for objects of 10 kB or larger — but as always with performance advice, measure the actual impact before making any changes.

Although it may still not be worth it. At work I have this hand-rolled utility for mocking the backend using a .har file(which is a JSON). I use it to reproduce bugs found by the testers, who are kind enough to supply me both with such a file and a screencast.

On a MacBook Pro a 2.6MB .har file takes about 140ms to parse and process.




I find this really interesting, because at some point the absolute performance benefits of `JSON.parse` is overshadowed by the fact that it blocks the main thread.

I worked on an app a while ago which would have to parse 50mb+ JSON objects on mobile devices. In some cases (especially on mid-range and low-end devices) it would hang the main thread for a couple seconds!

So I ended up using a library called oboe.js [1] to incrementally parse the massive JSON blobs putting liberal `setTimeout`'s between each step to avoid hanging the main thread for more than about 200ms at a time.

This meant that it would often take 5x longer to fully parse the JSON blob than just using `JSON.parse`, but it was a much nicer UX as the UI would never hang or freeze during that process (at least perceptively), and the user wasn't waiting on that parsing to happen to use the app, there was still more user-input I needed from them at that time. So even though it would often take 15+ seconds to parse now, the user was often spending 30+ seconds inputting more information, and now the UI would be fluid the whole time.


If you really need to work with large data files 1mb+. Json is a terrible format. You should look into flat buffers. It’s like having indexed json where there is no parsing cost. You can have millions of rows and nested objects and it will only read the bytes it needs.

It is length prefix encoded format so it’s pretty safe to work in a streaming manner too.


Good article on how Facebook used them for their mobile app:

https://code.fb.com/android/improving-facebook-s-performance...


Why not just use promises?

side note: legit question, I don't do web/app dev


Because JSON.parse blocks the thread it's in, and JS is single threaded [1].

So even if you put it behind a promise, when that promise actually runs, it will block the thread.

In essence, using promises (or callbacks or timeouts or anything else like that) allows you to delay the thread-blocking, but once the code hits `JSON.parse`, no other javascript will run until it completes. And since no other javascript will run, the UI is entirely unresponsive during that time as well.

[1] Technically there are web-workers, and I looked into them to try and solve this problem. Unfortunately any complex-objects that get sent to or from a worker need to be serialized (no pass-by-reference is allowed except for a very small subset of "C style" arrays called TypedArrays). So while you could technically send the string to a worker and have the worker call `JSON.parse` on it to get an object, when you go to pass that object back the javascript engine will need to do an "implicit" `JSON.stringify` in the worker, then a `JSON.parse` in the main thread. Making it entirely useless for my usecase.

But continuing with that same thought process, I very nearly went for an architecture that used a web-worker, did the `JSON.parse` in the worker, then exposed methods that could be called from the main thread to get small amounts of data out of the worker as needed. Something like `worker.getProperty('foo.bar.baz')` which would only take the parsing hit for very small subsets of the data at a time. But ultimately the oboe.js solution was simpler and faster at runtime.


Another trick most people don’t realize is that not only is the fetch api is asynchronous but response.json() does the conversion in a background thread and is non UI blocking.

If you have a large json object. You can use the fetch api to work with it. If you need to cache it, use the cache storage api. Unlike localStorage which will freeze the UI, cache storage wont.

It’s slightly slower since it needs to talk to another thread but who cares as long as the UI is responsive to do other things.


This is a common misconception, but response.json() still blocks the main thread.

It looks like it doesn't, but the same exact symptoms will happen even while awaiting the fetch json().


I'm guessing with Oboe.js you solved this by capturing a stream(?) of JSON but only parsing relevant chunks as they appear and match the selector? Or do you simply load the larger chunks at once (either by a request or embedding JSON into the template server side) instead of streaming?

http://oboejs.com/examples#demarshalling-json-to-an-oop-mode...

I could see the value in this for sure. I currently have a problem of loading a ton of JS for some users who have thousands of objects embedded in the view with Rails using toJSON() in a <script>. It’s creating far too much weight on the frontend. I’ve been considering fetching it via a simple REST request instead.


Thank you for the excellent explanation!

I think of js entirely from a node.js perspective where I conceptualize it as an async task. Is this also wrong?


Node suffers from the same issues, but it's generally not as noticable in most cases. A similar situation in node would cause the server to not be able to respond to any other requests during the `JSON.parse` execution. But in the Node world, you have more options for how to get around those problems (like load balancing requests among several node processes).

But both server-side and client-side JS use the same system, the event loop. It's basically a message-queue of events that get stacked up, and the JS engine will one at a time grab the oldest event in that queue and process it to completion. Anything "async" will just throw a new event into that queue of events to be processed. The secret sauce is that any IO is done "outside" the JS execution, so other events can be processed while the IO is waiting to complete.

Take a look at this link, or search up the JS event-loop if you want to get a better explanation. It's deceptively simple.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Even...


> Is this also wrong?

Yes, node.js javascript runtime is based on V8, the same that runs in Chrome. Javascript is single threaded so anything that is not I/O bound will block the main thread. If you don't want to block the thread becasue you have long running calculation/parsing task, then you can use worker threads[1]. This will run your task in separate thread and not block the main one.

[1] https://nodejs.org/dist/latest-v12.x/docs/api/worker_threads...


and not to beat a dead horse, but worker threads again wouldn't work in this exact situation even in Nodejs. They suffer from the same problems that web-workers do, meaning they use a structured copy algorithm to send data between workers (with the exception of TypedArrays), and therefore would hang the "main thread" just as long as if you did the `JSON.parse` directly in it.

It's a really annoying problem, and I'm actually really happy to see that many others have the exact same thoughts I had at the time, and that I wasn't just missing something obvious!


Fun detail: node internally will use thread pools to do CPU-intensive tasks that would normally block the main thread.

For example: https://github.com/nodejs/node/blob/master/src/node_crypto.c...

I generally use that as an example when explaining to people why Node isn't a great fit for a lot of workloads. They have to use these features internally, but you as the user with a CPU-intensive job don't have access to those features.


Maybe the worker could parse the JSON to build an index and then send over just the index. The main thread could then use the index to access small substrings of the original giant JSON string, parse those and cache the result?


what if you used multiple async ajax requests to load different parts of the UI in place of loading all 50MB at once? could that be what the OP meant by "promiseS"?


Promises are a way to deal with async code. Parsing JSON is synchronous and CPU-bound, so promises offer no benefit. And since web pages are single-threaded[0], there isn't really any way you can parse JSON in the background and wait on the result.

[0]: There is now the Web Workers API which does allow you to run code in the background. I've never used it, but I have heard that it has a pretty high overhead since you have to communicate with it through message passing, so it's possible you wouldn't actually gain anything by using it to parse a large JSON object.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...


Promises still run on "the main thread" so a CPU intensive task in a promise is still going to block things. You could use a promise if you delegated your CPU task to another process or some C code that did actual threading.

I believe you'd use Workers (WebWorkers?) https://developer.mozilla.org/en-US/docs/Web/API/Workers to actually do it off the main thread entirely inside JS.


Promises still run in the main thread.

You could try to use a web worker, but then you run into the problem that they don't have shared memory, so you need to pass data back some other way.


IndexedDB runs on web workers, so that might be a good way to push the work off the main thread but still share the result.


JSON.parse is not interruptible. All the answers about promises and single threading are interesting but that's the crux of the issue.


Because JS is single threaded. If a task is too big you need to split it or the UI can't be updated until the task is done.


Is that relevant when comparing parsing json with parsing literal objects? I don't know much about JavaScript engines but I'd expect that parsing literal objects in code is also blocks the main thread.


Would probably break that up similar to how you did in that case as well. Though may use multiple server request (chunks) and/or use a websocket for the data feed.

What was the memory overhead for the application?


I don't remember details about memory stuff, it was a few years ago now, but I was pleasantly surprised to see that it wasn't nearly as bad as I first assumed it would be.

And I did originally plan on using something like a websocket, but turns out with some minor changes on the server side we could start streaming data while it was still being gathered, and oboe.js is actually able to start parsing data even while it's still downloading from a normal XHR request, and is designed to be as efficient as possible (so it throws away string data as soon as it's not needed any more).

So there weren't really any additional benefits to be had from using websockets and breaking it up into multiple distinct requests would probably have been slower!

(I just realized I forgot to add a link to oboe.js! But I highly recommend it. It seems it's just gotten better since the last time i've used it)

[1] http://oboejs.com


I'm asking purely out of curiosity - what was the content of such a large JSON object?


It was a carton scanning app, so basically a massive array of objects (carton data) which were needed so the app could function and route cartons and validate deliveries entirely offline. Due to some unfortunate limitations from our clients and some edge cases, we couldn't filter down the data on the server ahead of time. So we ended up having to keep that massive amount of data on the device, and at the end of the day 95% of it would be unused, but we wouldn't know which 95% until the device was already offline.

It was a system where the goalposts moved many times during the development. If I were to do it again, I wouldn't use JSON, but after having the goals change a few times and then having the original server-side components get co opted to work on other projects, it was hard to justify the time that would be spent switching to a different, more appropriate wire format.


I kept reading that as cartoons and I was just so confused for a second...


You can also pretty easily use a web worker now, they work well. Here's [1] an example with React hooks.

Example fibonacci worker code that doesn't block the UI, even at larger calculations

  const fib = n => (n < 2 ? n : fib(n - 1) + fib(n - 2))
  
  onmessage = msg => {
    console.log('fibonacci worker onmessage', msg)
    postMessage({ num: msg.data, result: fib(msg.data) })
  }
[1] https://github.com/bharathnayak03/react-webworker-hook


Web workers won't work in this case because they need to serialize all data going into and out of them (with the exception of TypedArrays).

So passing a string to a worker and having it JSON.parse it works great. But when you go to pass that object back to the main thread, it implicitly does a JSON.stringify and a JSON.parse back on the main thread (technically it's called a "Structured Copy", but it's mostly the same thing), putting you in the exact same situation.


Good to know, thanks for clarifying.


And thanks to you for helping show me that I wasn't the only one to try that!

This whole thread has been really nice to read, because I beat my head against a wall for a long time before I finally found a solution, and I'm glad to read that I wasn't the only one to think this was a lot more deceptively hard than I thought at first thought (or second, or third...)


If this really NEEDS to be a client-side-only solution, I still believe a worker is the only way to go. Only, in this case, it needs to behave like an API. So your worker not only parses the JSON, but also responds to post messages with only the data that is requested from it.

Why? Because a large JSON structure is most probably just a large JSON structure, but you most probably don't need it as a whole. You may need a total count of items, you may need a paginated set of items, or only a certain item or a set of fields of items — well, an API.


> it implicitly does a JSON.stringify and a JSON.parse back on the main thread (technically it's called a "Structured Copy", but it's mostly the same thing)

Except funnily enough JSON.stringify + JSON.parse is usually recommendation as it's either comparable or faster than the structured copy the engine itself does :/

Web workers are depressingly bad...


You might be interested in a tool I wrote to serve .har files called server-replay: https://github.com/Stuk/server-replay

It also allows you to overlay local files, so you can change code while reusing server responses.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: