My sense is this is only true today because OS kernels are ridiculously slow relative to what the hardware can achieve.
Most of my recent designs treat RAM as if it were (what we used to considered to be) disks, i.e. all computation and in-process data is in cache exclusively, and "going to RAM" requires the use of a B-tree-like structure to amortize the cost.
For example, once you've opened a RAM page line on a normal four-channel Xeon server, you can read the entire 4KB page in about the same time it takes to read one byte, switch pages, and then read another byte. (Of course, you can't do that either since the entire cache line will be filled, but the overall point still stands.)
The situation we're in today with RAM is pretty much the identical situation with the disks of yore. Anyway…interesting article nonetheless.
Right, modern CPUs can do 50 gigaflops per core. There's absolutely no chance we're going to have non-volatile storage that can do hundreds of billions of IOPS any time soon (if only because you won't be able to get that much data over PCI-express).
Further given you can saturate 16 lanes of PCIe when talking to a GPU there's no reason you shouldn't be able to do the same for storage, it's just a matter of having the right abstractions and the right kind of thinking like you're saying.
It sounds more like storage and RAM are going to converge (and people are still learning to deal with how slow RAM is compared to the CPU these days).
Not sure exactly what OP is referring to, but CSS-trees [1] are a classic example of cache-aware indexing structures that fetch entire pages into cache and arrange data so that most of the comparisons happen on cached data. In most cases, they significantly outperform binary trees. Masstree [2] is a more recent example of this.
Most of my recent designs treat RAM as if it were (what we used to considered to be) disks, i.e. all computation and in-process data is in cache exclusively, and "going to RAM" requires the use of a B-tree-like structure to amortize the cost.
For example, once you've opened a RAM page line on a normal four-channel Xeon server, you can read the entire 4KB page in about the same time it takes to read one byte, switch pages, and then read another byte. (Of course, you can't do that either since the entire cache line will be filled, but the overall point still stands.)
The situation we're in today with RAM is pretty much the identical situation with the disks of yore. Anyway…interesting article nonetheless.