This (and the older blog-post about backing stores[0]) are incredibly relevant to my LZ-string optimisations[1]. LZ-string takes an input string, builds a dictionary of substrings as it scans over the input, and outputs substring-tokens to a bit-stream, then converts that bit-stream into a new string.
How that dictionary is built is what is most relevant here: it is the greatest bottleneck in determining the speed of the algorithm. I have tried many variants. The ones that seem to work best match what is described here:
- instead of naively using sub-strings and dictionary lookup, build a trie[2] (putting all sub-strings in an object used as a flat dictionary quickly overflows the 1022-elements limit)
- instead of character strings, use charCodeAt[0] to use integer keys in the look-up of the trie (index with elements rather than hashcodes, and because JS strings are UTF16, the keys are SMI. It also removes string creation overhead).
- for a long time, I used one object for each node, with 0 as the key to look up its value, and charCode+1 to look up next nodes. In recent benchmarks it turned out that { v: val, d: {} } is faster[4][5][6]. This makes sense because: "we don't store information about present indexed properties on the HiddenClass" [0]. In other words: the new options creates a distinct hidden class compared to a plain object, which might be optimised. At the same time adding integer keys to a separate d object will not mess with this hidden class, preventing de-opts.
- for really fast performance except in a few degenerate cases, instead of key look-up, use an array with linear search (yes, really! Turns out compact arrays without holes are faster to the point where even linear search makes it worth it - up to a certain length).
So this explains a lot of why certain approaches work better than others. Some of the explanations here might let me optimise it even further, but at this point it would become hyper-optimized for V8. Are there some take-aways that I can generalise? And is there a similar blog for the other big javascript engines?
I expect that "type stability" - that is, maintaining identical hidden classes - is fairly universal for example.
> I expect that "type stability" - that is, maintaining identical hidden classes - is fairly universal for example.
To go into this a bit further: I have a piece of code for rendering scatterplots in Canvas. For big data-sets it starts out slow, but after one or two renders the JIT kicks in and it really speeds it up[0]. However, I recently noticed that when I switch between input data a lot, rendering slows down again tremendously.
I suspect the main reason is the following: the input data (for x/y coordinates, color, etc) can be any of the various TypedArrays. I figured that all else being equal, using the most compact array type possible would give the best cache benefits. However, this means the shape of the array changes based on the data used as input, prompting de-opts in the plotting functions, undoing the JIT-benefits.
So now my thinking is that maybe, the best option is refactoring the plotting function into a pre-processing function that converts the input data to arrays of consistent type (Float32Arrays, for example), and the actual plotting code that always receives the same type of array.
Furthermore, I wonder how much of a difference it would make if one copies the same function once for each array type, and use a "type switch" to ensure that the hidden class for the input of each of these functions is constant:
function funcFactory(){
return (arr) => {
// do stuff here
};
}
const uint8Func = funcFactory();
const uint16Func = funcFactory();
const float32Func = funcFactory();
const arrayFunc = funcFactory();
function process(arr){
switch (arr.constructor) {
case Uint8Array:
return uint8Func(arr);
case Uint16Array:
return uint16Func(arr);
case Float32Array:
return float32Func(arr);
default:
// Remaining array types
return arrayFunc(arr);
}
}
I'd have to test this, curious to see if it will make a big difference or not.
I would expect that this is where WebAssembly could offer great help. You could write your algorithm against flat memory rather than the inpredictable data structures that JavaScript offers. Or port one from C. (https://github.com/kripken/lzma.js)
It would, but for the sake of backwards compatibility, these kinds of optimisations don't hurt
Similarly, one reason I'm using canvas instead of WebGL is because it's very easy to have hundreds of canvas elements on a page (for example when plotting hundreds of little barplots).
WebGL, afaik, does not expect you to open more than one WebGL view, and requires some trickery to work around this. For example, this regl-in-idyll example opens one "transparent" WebGL overlay over the whole screen and "scrolls" along with the actual webpage:
Which, admittedly, is a really cool hack! However, if you then take into account that my employees like to print their webpages, it turns into a pain again...
How that dictionary is built is what is most relevant here: it is the greatest bottleneck in determining the speed of the algorithm. I have tried many variants. The ones that seem to work best match what is described here:
- instead of naively using sub-strings and dictionary lookup, build a trie[2] (putting all sub-strings in an object used as a flat dictionary quickly overflows the 1022-elements limit)
- instead of character strings, use charCodeAt[0] to use integer keys in the look-up of the trie (index with elements rather than hashcodes, and because JS strings are UTF16, the keys are SMI. It also removes string creation overhead).
- for a long time, I used one object for each node, with 0 as the key to look up its value, and charCode+1 to look up next nodes. In recent benchmarks it turned out that { v: val, d: {} } is faster[4][5][6]. This makes sense because: "we don't store information about present indexed properties on the HiddenClass" [0]. In other words: the new options creates a distinct hidden class compared to a plain object, which might be optimised. At the same time adding integer keys to a separate d object will not mess with this hidden class, preventing de-opts.
- for really fast performance except in a few degenerate cases, instead of key look-up, use an array with linear search (yes, really! Turns out compact arrays without holes are faster to the point where even linear search makes it worth it - up to a certain length).
So this explains a lot of why certain approaches work better than others. Some of the explanations here might let me optimise it even further, but at this point it would become hyper-optimized for V8. Are there some take-aways that I can generalise? And is there a similar blog for the other big javascript engines?
I expect that "type stability" - that is, maintaining identical hidden classes - is fairly universal for example.
[0] https://v8project.blogspot.se/2017/08/fast-properties.html
[1] https://github.com/pieroxy/lz-string/pull/98
[2] https://en.wikipedia.org/wiki/Trie
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
[4] https://run.perf.zone/view/LZ-String-2018-01-18-v2-151619850...
[5] https://run.perf.zone/view/LZ-String-2018-01-18-v1-151619752...
[6] https://github.com/pieroxy/lz-string/pull/98/commits/1894f7c...