All those writes happen in big blocks with other updates, so the number of write...

jlokier · on April 21, 2023

A single durable update is the worst case for a classic B-tree with no WAL, though, and where the classic LSM-tree has the advantage. You can't use it to show a consistent write advantage to LSM-trees, except in applications that only do those kinds of updates.

If you write (say) 1000 small, contiguous rows in a single transaction, the B-tree wins. The B-tree writes a few full leaf pages and then updates just enough interior pages to point to them. The LSM-tree writes the rows to the first level, then later will write them again to the second level, then to the third layer, then... N times for N levels. The data also has to be read N-1 times.

At (say) 100,000 contiguous rows in a single transaction, the classic LSM-tree is still poor, but a modern LSM-tree with multiple files in the bulk level, with special handling of bulk loads to bypass the levels and write directly to the logical-middle in the bulk level, is similar I/O performance to the B-tree, so it catches up. I think RocksDB has this but you have to request it explicitly, rather than it detecting when to do so automatically. Probably with lower layout discontiguity if it works by writing to files or large contiguous zones. But if we are comparing LSM-tree with modification for bulk loads, we may as well compare with B-tree with a similar modification, which can also use a zone strategy to reduce discontiguity on bulk loads.