According to their benchmark, multi-threaded rendering on a Ryzen 7950X will take about 1.7ms with 4 cores for drawing 1000x polygons with 40 vertices each on a 32x32 px area, which seems like a reasonable approximation for a text character on a high-DPI display. The default font size in JetBrains fits about 2500 characters onto my screen, so I'd expect a 4.25ms frame time, meaning I am capped at 235 FPS with 4 CPU cores running at full speed.
I believe the best way is probably to use Blend2D for rendering glyph bitmaps and then compositing them into the full text on GPU.
Sadly, CPU memory is still plenty slow compared to GPU memory and when you need to copy around 100 MB images (4K RGB float), then that quickly becomes the limiting factor.
Text rendering is something that will get improved in the future.
At the moment when you render text Blend2D queries each character from the font and then rasterizes all the edges and runs a pipeline to composite them. All these steps are super optimized (there is even a SIMD accelerated TrueType decoder, which I have successfully ported to AArch64 recently), so when you compare this approach against other libraries you still get like 4-5x performance difference in favor of Blend2D, but if you compare this method against cached glyphs Blend2D loses as it has to do much more work per glyph.
So the plan is to use the existing pipeline for glyphs that are larger (let's say 30px+ vertically) and to use caching for glyphs that are smaller, but how it's gonna be cached is currently in research as I don't consider simple glyph caching in a mask a great solution (it cannot be sub-pixel positioned and it cannot be rotated - and if you want that subpixel positioned the cache would have to store each glyph several times).
There is a demo application in blend2d-apps repository that can be used to compare Blend2D text rendering vs Qt, and the caching Qt does is clearly visible in this demo - when the text is smaller Qt renders it differently and characters can "jump" from one pixel to another when the font size is slightly scaled up and down, so Qt glyph caching has its limits and it's not nice when you render animated text, for example. This is a property that I consider very important so that's why I want to design something better than glyph masks that would be simple to calculate on CPU. One additional interesting property of Qt glyph caching is that once you want to render text having a size that was not cached previously, something in Qt takes 5ms to setup, which is insane...
BTW one nice property of Blend2D text rendering is that when you use the multithreaded rendering context the whole text pipeline would run multithreaded as well (all the outline decoding, GSUB/GPOS processing, rasterization, etc...).
I believe the best way is probably to use Blend2D for rendering glyph bitmaps and then compositing them into the full text on GPU.
Sadly, CPU memory is still plenty slow compared to GPU memory and when you need to copy around 100 MB images (4K RGB float), then that quickly becomes the limiting factor.