Problems with CAP, and Yahoo’s little known NoSQL system

ryanobjc · on April 15, 2015

Classic blog from 2010. The idea didn't catch on however, and no one uses these terms. PNUTS is still completely closed source inside Yahoo -- this is because PNUTS depends on an internal queuing system.

As for YCSB, it's commonly references and is easily available on github. It hasn't changed much lately, and now with tools like jepsen that focus on correctness as well as performance, YCSB is no longer the preferred testing tool.

anonetal · on April 15, 2015

I agree, and I wish the concepts here were more widely adopted. Many people seem to equate low-consistency modes (offered by Dynamo/Cassandra etc) with CAP and guaranteeing availability in presence of network partitions. In practice, network partitions seems much less of a concern and the low-consistency modes are really needed to get reasonable performance/latency instead.

ryanobjc · on April 15, 2015

The concept listed in here is either (a) too complex or (b) not complete enough to provide a full rich mental model.

CAP was highly successful because it fits in with an existing meme - 3 things, you can only get 2 of them. It's simple, and easy to apply in a trivial fashion.

But CAP theorem doesn't have enough "meat" as an engineering analysis, and doesn't guide your system design. Yes there's a proof, but it doesn't tell you how to balance the 3 concepts and how to directly compare system designs.

The thing is that as a descriptive model of how an entire distributed database works, CAP just aint enough. That's why blog posts like https://aphyr.com/posts/313-strong-consistency-models and concepts of 'linearizability' are very useful.

alexnewman · on April 16, 2015

Having implemented and used multiple consistent coordination systems: Raft+ in c5 (I helped write), Single Decree Paxos via DConE at WanDISCO and timeline consistent implementations via HBase, I have to say the academics miss that the devil is in the implementation details. We all focus on high level things like CAP (although FLP is what real hard core academics care about) when system details like how you can aggregate fsyncs, system pauses (Especially with GC) and how you integrate your coordination system into the larger system play a much larger role in overall system latency. The coordination posture is a minor detail when compared to GC issues. Now I know what you are going to say, "He's not an academic". He ran a real DB company. I totally disagree. The DB community in general focuses on the wrong thing. That's true in Hadoop, it's true with cassandra, and I would bet that it's true at google as well.

hiphipjorge · on April 16, 2015

A classic! If you had to take away one thing from this article, it would be that the CAP theorem (while useful) doesn't take into consideration latency, which might be more important than even more important than partition tolerance. Yes, partition tolerance is inevitable, but you have to deal with latency every single time!

shin_lao · on April 15, 2015

The author of CAP himself said that the CAP theorem gained undue popularity. It is an interesting theorem only because it models what partition implies in terms of trade-off when they happen.

But partitions are rare. And when partitions happen, generally you can't access the partitioned area so a lot of problems disappear.

Availability isn't an on/off switch, there is a wide range of how "available" you can be and what you can do. For example you can allow reads but disallow rights.

Last but not least, the most important is what happens after the partition is over and what level of guarantees you offer regarding coherence.

yummyfajitas · on April 15, 2015

Partitions are not rare. A partition simply means that one node is not available to another, which does not have to be caused by a network failure. For example, a stop the world GC is equivalent to a network failure - the node is unreacheable at this time. So is a deployment with downtime (e.g. a 10 second restart per-node).

Also, while you can't access the partitioned area, you might have nodes outside that area. The question is whether the non-partitioned nodes become useless to you or not.

http://kellabyte.com/2013/11/04/the-network-partitions-are-r...

https://webcache.googleusercontent.com/search?q=cache:D3htez...

shin_lao · on April 16, 2015

Partitions are rare relative to other events, and I was quoting Eric Brewer, the author of the CAP theorem.

If you have one partition per day (which is very often) but do 100,000 request per seconds: partitions are rare.

10 seconds is hardly a partition as it is below the default TCP timeout (one minute).

Also, while you can't access the partitioned area, you might have nodes outside that area.

So these nodes can't access the partition and you don't have any conflict.

yummyfajitas · on April 16, 2015

10 seconds/day breaks 3 9's. The fact that this might be below the TCP timeout is irrelevant, that's almost certainly NOT the timeout that will be considered as unavailability. E.g., suppose you are delivering an advertising tag. Unavailable means > 200ms.

So these nodes can't access the partition and you don't have any conflict.

Conflict comes when you reconnect.

_benedict · on April 16, 2015

The TCP timeout is kind of irrelevant if your application defined timeout is lower. Cassandra's default timeout is 10s, but in practice most users expect their responses in less than a second.

ryanobjc · on April 15, 2015

Also partition in the system definition can be as little as 'single node failure'.

But I've seen plenty of systems that handle single node failure, but really get fucked when you have a network split where 50% of the nodes can talk to each other but not the other 50%. CAP theorem doesn't help you build a system that doesn't do very bad things when this happens.

int19h · on April 16, 2015

The way I've heard the CAP theorem used practically at a high level is to frame the question as follows:

What happens in the case of a (logical) network partition? - an AP system continues taking requests and provides eventual consistency, while a CP system waits for the partition to go away, or says come back later.

explosion · on April 15, 2015

I like the author's concept of PACELC, though it seems a bit implementation-specific.

jchrisa · on April 15, 2015

title should say (2010)

aaa667 · on April 16, 2015

I don't understand what is meant by "this means that the roles of the A and C in CAP are asymmetric" - could someone explain this to me?

jfoutz · on April 16, 2015

You can be available, or you can be consistent. The higher guarantee of consistency you demand, lowers your availability.

Imagine a small database with a million copies on a million machines. when I write something, i'm going to have to wait a bit while all million machines acknowledge the write is complete. If i have a lower threshold, say 1, the write completes real fast but the million machines aren't consistent.

Real systems balance consistency and availability.

fred256 · on April 16, 2015

If I understand the author correctly: a CP system sacrifices Availability only during a Partition, whereas an AP system tends to sacrifice Consistency all the time.