Hacker News new | past | comments | ask | show | jobs | submit login

All those writes happen in big blocks with other updates, so the number of writes per update is still relatively low. Compare that to e.g. postgres where a single update causes quite a few page writes that probably don't get batched up (add new row version in heap, update old row version in heap, maybe update indexes, vacuum old tuple in heap, maybe vacuum indexes).



A single durable update is the worst case for a classic B-tree with no WAL, though, and where the classic LSM-tree has the advantage. You can't use it to show a consistent write advantage to LSM-trees, except in applications that only do those kinds of updates.

If you write (say) 1000 small, contiguous rows in a single transaction, the B-tree wins. The B-tree writes a few full leaf pages and then updates just enough interior pages to point to them. The LSM-tree writes the rows to the first level, then later will write them again to the second level, then to the third layer, then... N times for N levels. The data also has to be read N-1 times.

At (say) 100,000 contiguous rows in a single transaction, the classic LSM-tree is still poor, but a modern LSM-tree with multiple files in the bulk level, with special handling of bulk loads to bypass the levels and write directly to the logical-middle in the bulk level, is similar I/O performance to the B-tree, so it catches up. I think RocksDB has this but you have to request it explicitly, rather than it detecting when to do so automatically. Probably with lower layout discontiguity if it works by writing to files or large contiguous zones. But if we are comparing LSM-tree with modification for bulk loads, we may as well compare with B-tree with a similar modification, which can also use a zone strategy to reduce discontiguity on bulk loads.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: