Anti-Caching: A New Approach to Database Management System Architecture [pdf]

andrewstuart2 · on Nov 1, 2015

Archiving. You've described archiving. Taking less-recently-used data off of main storage (RAM in this case) onto a slower, cheaper medium.

Heck, the only reason we have hard disks and memory is that it's not economically feasible to have billions of CPU registers, and that there's a further trade-off between volatility and speed. This is literally an older technique than the computer. Files are kept in the main office until they're not being used and then moved to the basement for long-term storage.

cafebeen · on Nov 1, 2015

Yes, that's the basic principle, but I think the "transactionally-safe" part is the point of the paper, i.e the fine grained eviction and non-blocking fetches.

hliyan · on Nov 1, 2015

Even more simply put: "memory hierarchy exists"?

andrewstuart2 · on Nov 1, 2015

Haha seriously. "Engineering trade-offs exist. A new theory by my startup. (Buy my product)"

emeryberger · on Nov 1, 2015

Anti-caching is _literally_ caching. The system may be fantastic, but that's what it's doing.

Consider the following key characteristics of "anti-caching":

(1) Cold data is moved from RAM to disk.

This is cache replacement. Eventually, caches fill and you have to choose what to evict. While there are many replacement algorithms, one of the most popular is LRU, which is what is used here. In conventional CPU caches, data is moved a cache line at a time, moved transactionally. Here, it is a tuple at a time, moved transactionally.

(2) There is only one item present in either RAM or disk.

This is (almost) exclusive caching, which maintains exactly one item in all levels of the cache hierarchy (as done by the AMD Athlon). The key difference is removing it from the bottom of the hierarchy. This approach may be novel, but as far as I can tell, it is the primary novelty.

To be clear, adapting all of this to DBMS architecture may be a great idea, but let's call things by their names.

carefulfungi · on Nov 1, 2015

I work at VoltDB (based on the original h-store concepts) - if anyone has questions or ideas about "anti-caching", feel free to ping me.

notacoward · on Nov 1, 2015

Please put [2013] on this. It's good stuff, but it's not news.

falcolas · on Nov 1, 2015

From past experience in the database industry, command logs are insufficient at maintaining data integrity after a crash. They capture the bare minimum information to reproduce a transaction, which is frequently not enough information. A log of "I'm changing field X from A to B" produces much more reliability in practice.

MichaelGG · on Nov 1, 2015

If they are capturing enough to reproduce a transaction, then by definition that's enough information :). Non-deterministic commands (WHERE RANDOM() without a known seed) are the problem.\

jhugg · on Nov 1, 2015

I think it really depends on the product. Determinism has been a big issue with logical logs in many systems, but it’s only a problem if you don’t plan and test for it.

If you’re interested in how VoltDB ensures and tests determinism, check out my talk from StrangeLoop this year: https://www.youtube.com/watch?v=gJRj3vJL4wE

nickpsecurity · on Nov 1, 2015

Like andrew, the premise is misleading: any worthwhile, in-memory database inverts the memory cache and RAM situation because the whole point is to be in memory. They all typically have a way of storing that data in secondary storage over time. Solo [1], for instance, can do 700,000 transactions a second on a 32-core machine with data regularly saved to the disk.

Now, the strategy they use is interesting. I'd just rather see an apples-to-apples comparison of it against DB's like FoundationDB, F1 RDBMS, and others doing high-performance w/ strong consistency. The more modern stuff, that is. MySQL and its performance are quite dated.

[1] http://db.csail.mit.edu/pubs/silo.pdf

limau · on Nov 2, 2015

A recent survey on in-memory big data processing systems: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7097...