I've been hoping to put the code online for a long time, but I'm suffering from "it's not cleaned up enough" syndrome. But I'll talk a little here.
I'd been after real time ray tracing since the 1990's so I had a goal of putting both static environments and dynamic objects/players in the same acceleration structure. A lot of the old BSP-tree stuff or other BVH are used only for static geometry and I didn't like that. The problem with those structures is that if you move some things around it may require regeneration of the entire structure. Or at the very least, local updates that result in a different structure than if you regenerated the entire thing.
Part of the reason for that is trying to partition the geometric primitives roughly in two groups so a partition roughly cuts the scene in half. I decided that with a regular octtree structure, which nodes a primitive is contained in could be independent of the other primitives. They may share tree nodes, but I never decide whether to split a node based on content - instead each primitive is inserted into the nodes that make sense for it. This does lead to a few nodes having a lot of content - thing of the node that encloses a triangle fan - but those are rare and you won't typically be shooting lot of rays through them.
Also, with the primitives deciding what nodes they occupy, we can now do local deletes and inserts without having to regenerate the whole world structure.
For inserting a primitive we take its bounding box. If that is outside the root box of the world, I repeatedly double the size of the world by adding a new root node and placing the existing roots children and some new parents inside it - that's a neat recursion too. Next, we traverse the tree from the root calling on only the children that overlap the bounding box until we reach the node size that was requested. This decent is trivial for bounding boxes, but doing it with high precision specifically for triangles was harder.
Once the nodes that need to contain a primitive are reached, it's a matter of adding that primitive to a linked list off the tree node. But each primitive will need to exist in multiple lists. This means not putting the primitives directly in a list, but having a list of little link structures that are added to the nodes list and point back to the primitives.
For deletions, each primitive will need a list of link nodes that point back to it. Deletion consists of following this list of links and deleting them from the list that contains them.
So the links themselves will each be a part of two lists. They are part of a list coming off a primitive and they are part of a list coming off the tree node. It took a lot of thought to shrink the link node data structure, but at present it is 4 pointers in size. It needs to contain two "next" pointers, a "previous" pointer (to allow deletion from the tree nodes list of primitives), and it also has to have pointers to both the tree node AND the primitive.
The reasons for those last two pointers is thus: when checking which primitives a ray intersects, we want to start at the tree node that contains it and quickly get to the primitives for ray/object intersection tests. This is done by following the list of links via their "next" pointer which also following their links to "primitive". For deletions, we start at the primitive and follow the other set of "next" pointers and do deletes from the other list by using both the next and previous pointers. We never need a previous pointer for the list off the primitive because that entire list is always deleted, where the other list may still contain other primitives. There remains a problem that when an Octree node becomes empty, I need to prune back the tree - this is why the links need to also contain a pointer back to the tree node that contained the link. So now we're looking at a link structure that contains 5 pointers {2 nexts, one previous, one primitive, and one treenode}
Then I had the insight that no matter which list we're traversing, we always have a pointer to either the treenode or the primitive. So those two pointers can be combined via sum or XOR into a single value. If your deleting a primitive you just XOR a pointer to the primitive with that value to recover a pointer to the tree node. When you're doing intersection tests, you XOR the treenode pointer with that value to recover a pointer to the primitive. This change resulted in no significant performance change while reducing the structure to a nice size of 4 pointers.
Another problem was that having a linked list with a previous pointer meant the octree node needed to always contain one of these link structures because deletion depended on that "back" pointer pointing at another link. Turns out you can have the back pointer point directly to the "next" pointer of the previous link. That means the octree node now only needs a single pointer to the first "link" node in the list and that links back pointer points directly to that pointer.
So for fitting an arbitrary number of things (primitives) each into an arbitrary number of boxes (tree nodes) we add a single pointer to the thing structure, a single pointer to the box structure, and then use these little link structures that are the size of just 4 pointers.
It was a long journey to arrive at that data structure, but for a while it was real point of pride for me. The structures are nice, but the code is still a bit messy with the remains of several experiments still around. Also notice that the list in one direction does not have a back pointer, so you can never delete a box directly - well you could but it's not as easy. There isn't complete symmetry between the boxes and things.