> It correctly pointed out that by then, you're basically recreating your own po...

writebetterc · 2025-04-29T09:07:01 1745917621

>Benchmarks show its about 25% faster than the raw pointer version. (I don't know why - but I suspect the reason is due to better cache locality.)

Cache locality matters, but so does having less allocator pressure. Use 32-bit unsigned ints as indices, and you get improvements on that as well.

>The original version of my b-tree works just like how you'd implement it in C. Each internal node / leaf is a raw allocations on the heap.

I'd always try to avoid that type of allocation pattern in C++, FWIW :-).

akoboldfrying · 2025-04-29T08:06:10 1745913970

> Recently I rewrote the b-tree to simply use a vec of internal nodes

Doesn't this also require you to correctly and efficiently implement (equivalents of C's) malloc() and free()? IIUC your requirements are more constrained, in that malloc() will only ever be called with a single block size, meaning you could just maintain a stack of free indices -- though if tree nodes are comparable in size to integers this increases memory usage by a significant fraction.

(I just checked and Rust has unions, but they require unsafe. So, on pain of unsafe, you could implement a "traditional" freelist-based allocator that stores the index of the next free block in-place inside the node.)

variadix · 2025-04-29T18:18:13 1745950693

Depends on if you need to allocate/deallocate nodes. If you construct the tree once and don’t modify it thereafter you don’t need to. If you do need to modify and alloc/dealloc nodes you can use a bitmap to track free/occupied slots which is very fast (find first set + bitmanip) and has minuscule overhead even for integer sized elements.

josephg · 2025-04-30T00:38:19 1745973499

Yeah, or just store all freed nodes in a linked list. Eg, have a pointer / index from the root to the first unused (free) node, and in that node store a pointer to the next one and so on. This is pretty trivial to implement.

In my case, inserts and read operations vastly outnumber deletes. So much so that in all of my testing, I never saw a leaf node which could be freed anyway. (Leaves store ~32 values, and there were no cases where all of a leaf's values actually get deleted). I decided to just leak nodes if it ever happens in real life.

The algorithm processes data in batches then frees everything. So worst case, it just has slightly higher peak memory usage while processing. A fine trade in this case given it let me remove ~200 lines of code - and any bugs that might have been lurking in them.

pjmlp · 2025-04-29T07:32:43 1745911963

GC languages like C# don't need these tricks, because it is feature rich enough to do C++ style low level programming, and has value types.

jstimpfle · 2025-04-29T10:04:06 1745921046

Having gone full-in on this approach before, with some good success, it still feels wrong to me today. Contiguous storage may work for reasonable numbers of elements, but it's potentially blocking a huge contiguous chunk of address space especially for large numbers of elements.

I probably say this because I still have to main 32-bit binaries (only 2G of address space), but it can potentially be problematic even on 64-bit machines (typically 256 TB of address space), especially if the data structure should be a reusable container with unknown number of instances. If you don't know a reasonable upper bound of elements beforehand, you have to reallocate later, or drastically over-reserve from the start. The former removes a pointer stability guarantee, the later is uneconomical, it may even be uneconomical on 64-bit depending on how many instances of the data structures you plan to have. And having to reallocate when overflowing the preallocated space makes operations less deterministic with regards to execution time.

josephg · 2025-04-30T00:45:59 1745973959

> Having gone full-in on this approach before, with some good success, it still feels wrong to me today. Contiguous storage may work for reasonable numbers of elements, but it's potentially blocking a huge contiguous chunk of address space especially for large numbers of elements.

That makes sense. If my btree was gigabytes in size, I might rethink the approach for a number of reasons. But in my case, even for quite large input, the data structure never gets more than a few megabytes in size. Thats small enough that resizing the vec has a negligible performance impact.

It helps that my btree stores its contents using lossless internal run-length encoding. Eg, if I have values like this:

    {key: 5, value: 'a'}
    {key: 6, value: 'a'}
    {key: 7, value: 'a'}

Then I store them like this:

    {key: [5..8), value: 'a'}

In my use case, this compaction decreases the size of the data structure by about 20x. There's some overhead in joining and splitting values - but its easily worth it.

Animats · 2025-04-29T22:10:36 1745964636

> What it doesn't protect you from is use-after-free bugs.

Yes. I've found that problem in index-allocated code.

Also, when you do this, you need an allocator for the indexes. I've found bugs in those.

ycombinatrix · 2025-04-29T05:46:35 1745905595

Could std::rc::Weak solve the backreference problem?

Animats · 2025-04-29T06:24:50 1745907890

Weak is very helpful in preventing ownership loops which prevent deallocation. Weak plus RefCell lets you do back pointers cleanly. You call ".borrow()" to get access to the data protected by a RefCell. The run-time borrow panics if someone else is using the data item. This prevents two mutable pointers to the same data, which Rust requires.

Static analysis could potentially check for those potential panics at compile time. If that was implemented, the run time check, and the potential for a panic, would go away. It's not hard to check, provided that all borrows have limited scope. You just have to determine, conservatively, that no two borrow scopes for the same thing overlap.

If you had that check, it would be possible to have something that behaves like RefCell, but is checked entirely at compile time. Then you know you're free of potential double-borrow panics.

I started a discussion on this on a Rust forum. A problem is that you have to perform that check after template expansion, and the Rust compiler is not set up to do global analysis after template expansion. This idea needs further development.

This check belongs to the same set of checks which prevent deadlocking a mutex against itself. There's been some work on Rust static deadlock analysis, but it's still a research topic.

josephg · 2025-04-29T06:59:59 1745909999

I didn't consider that. Looking at how weak references work, that might work. It would reduce the need for raw pointers and unsafe code. But in exchange, it would add 16 bytes of overhead to every node in my data structure. That's pure overhead - since the reference count of all nodes should always be exactly 1.

However, I'm not sure what the implications are around mutability. I use a Cursor struct which stores a reference to a specific leaf node in the tree. Cursors can walk forward in the tree (cursor.next_entry()). The tree can also be modified at the cursor ___location (cursor.insert(item)). Modifying the tree via the cursor also updates some metadata all the way up from the leaf to the root.

If the cursor stored a Rc<Leaf> or Weak<Leaf>, I couldn't mutate the leaf item because rc.get_mut() returns None if there are other strong or weak pointers pointing to the node. (And that will always be the case!). Maybe I could use a Rc<Cell<Leaf>>? But then my pointers down the tree would need the same, and pointers up would be Weak<Cell<Leaf>> I guess? I have a headache just thinking about it.

Using Rc + Weak would mean less unsafe code, worse performance and code thats even harder to read and reason about. I don't have an intuitive sense of what the performance hit would be. And it might not be possible to implement this at all, because of mutability rules.

Switching to an array improved performance, removed all unsafe code and reduced complexity across the board. Cursors got significantly simpler - because they just store an array index. (And inserting becomes cursor.insert(item, &mut tree) - which is simple and easy to reason about.)

I really think the Vec<Node> / Vec<Leaf> approach is the best choice here. If I were writing this again, this is how I'd approach it from the start.

ycombinatrix · 2025-05-01T00:47:11 1746060431

Consider copy-pasting the code from Rc/Weak, then tweaking it to suit your needs (reduce the overhead).

I have done this before with stdlib stuff like io::Cursor and was pretty happy with the result.

lenkite · 2025-04-29T08:10:13 1745914213

One can also use this array-index approach in C++, utilize the `at` methods and have "memory safety guarantees", no ?

tmtvl · 2025-04-29T09:06:02 1745917562

> What it doesn't protect you from is use-after-free bugs.

How about using hash maps/hash tables/dictionaries/however it's called in Rust? You could generate unique IDs for the elements rather than using vector indices.