According to their benchmark, multi-threaded rendering on a Ryzen 7950X will tak...

Asm2D · on Feb 20, 2024

Text rendering is something that will get improved in the future.

At the moment when you render text Blend2D queries each character from the font and then rasterizes all the edges and runs a pipeline to composite them. All these steps are super optimized (there is even a SIMD accelerated TrueType decoder, which I have successfully ported to AArch64 recently), so when you compare this approach against other libraries you still get like 4-5x performance difference in favor of Blend2D, but if you compare this method against cached glyphs Blend2D loses as it has to do much more work per glyph.

So the plan is to use the existing pipeline for glyphs that are larger (let's say 30px+ vertically) and to use caching for glyphs that are smaller, but how it's gonna be cached is currently in research as I don't consider simple glyph caching in a mask a great solution (it cannot be sub-pixel positioned and it cannot be rotated - and if you want that subpixel positioned the cache would have to store each glyph several times).

There is a demo application in blend2d-apps repository that can be used to compare Blend2D text rendering vs Qt, and the caching Qt does is clearly visible in this demo - when the text is smaller Qt renders it differently and characters can "jump" from one pixel to another when the font size is slightly scaled up and down, so Qt glyph caching has its limits and it's not nice when you render animated text, for example. This is a property that I consider very important so that's why I want to design something better than glyph masks that would be simple to calculate on CPU. One additional interesting property of Qt glyph caching is that once you want to render text having a size that was not cached previously, something in Qt takes 5ms to setup, which is insane...

BTW one nice property of Blend2D text rendering is that when you use the multithreaded rendering context the whole text pipeline would run multithreaded as well (all the outline decoding, GSUB/GPOS processing, rasterization, etc...).