Why is Cockroach adding READ COMMITTED? Is using a lower level of isolation better for performance or just reduces the amount of serialization errors and retries that need to be done?
READ COMMITTED is great for applications that need a coherent snapshot of the database but not necessarily the absolutely most recent data, which in my experience is actually most apps.
It allows readers to see valid data (relationships are correct), while not blocking writers. It can be the difference between constant deadlocks and super-high throughput without lock contention.
The main motivation is reduce serialization errors, for applications that can handle the weaker isolation level. Especially for applications that were previously running fine under RC on another database.
It's often talked about how new sql databases offer better scalability than standard SQL databases, but I think it's maybe sometimes underappreciated how (some, not all) of them are also much simpler in terms of their consistency models.
I'd speculate this is because postgres and friends try to eek out every bit of single node performance (which helps with single row throughout and overall throughout, which is obviously much better for them than newsql) but the scalability of new SQL databases might allow them to prefer easy consistency over single node performance.
Possibly this is also just the passage of time benefiting newer systems.
Read committed is explicitly asking for hard mode. If you want a simple life stick with Serializable as always. It took years before people found anomalies in Repeatable Read in Postgres. This stuff is hard even for world class researchers.
You always need app-level retries for SERIALIZABLE isolation level. You don't need any explicit locking - the database should handle that for you (and in the case of PostgreSQL, locking is not the only tool it uses for avoiding serialization anomalies).
The strategy I use is to keep transactions as small as possible, and have retry functionality built into the transaction abstraction, so the buesiness logic doesn't really need to worry about it. I also explicitly use read-only transactions where possible.
Even more generally, distributed systems can find simpler solutions to things like "raise the throughput ceiling", and "handle disk failure", and "handle power failure" than single-box systems. This is for the simple reason that they have more options: beyond the constraints of a box, resource allocation is more flexible, failures less correlated, etc. That allows modern distributed databases to simply avoid some of the super hard problems that prior databases had to solve. Efficiency is still important, but the thing to optimize is mean system efficiency, not the peak performance of a handful of super hot boxes.
There's also the fact that decades of DB research have brought techniques and approaches that beat old ones, and retrofitting existing systems with them can be hard (e.g. see the efforts to remove some of the sharp edges of PG's MVCC behavior and how hard they've turned out to be).
How does the CockroachDB approach not deadlock? Surely retrying could encounter a situation where two competing UPDATE will lock rows in different order, and no amount of retrying will unlock the required rows, right?
I wondered this too and found this in the docs[0]:
Transactions at all isolation levels are subject to lock contention, where a transaction attempts to lock a row that is already locked by a write or
locking read. In such cases, the later transaction is blocked until the earlier transaction commits or rolls back, thus releasing its lock on the row.
Lock contention that produces a deadlock between two transactions will result in a transaction abort and a 40001 error
(ABORT_REASON_ABORTED_RECORD_FOUND or ABORT_REASON_PUSHER_ABORTED) returned to the client.
So looks like you still get a good old 40001 error just like with SERIALIZABLE isolation.
It's a good question. For simple UPDATEs, CockroachDB always executes in a deterministic, serial order and so it's likely the rows will be locked in the same order by any competing updates. (This can be confirmed by looking at the query plans.) Complex UPDATEs using joins and subqueries will need explicit ORDER BY to always lock in the same order.
If an UPDATE has to retry halfway through, locks are held across the retry to help the system make progress. But as you point out, this could cause lock acquisition to happen in an unexpected order if new rows qualify during a retry. So far we haven't run into this, but we might need to provide an option for an UPDATE to drop locks on retry if deadlock turns into a bigger problem than livelock. It depends on the workload.
> So far we haven't run into this, but we might need to provide an option for an UPDATE to drop locks on retry if deadlock turns into a bigger problem than livelock.
I'm not sure what you mean by that: the design can deadlock but you just have not seen it happening yet?
Edit: oh i see in a comment bellow that deadlocks are detected and abort the transaction.
If `UPDATE player SET level = 'AA' WHERE team = 'Gophers';` is executed before the player swap, then why should "Stonebreaker" be upgraded to "AA"? I'd be pretty mad at my database if I sent those 2 queries in sequence and my DB decided to re-order them.
The sleep actually really complicates things here. I understand some queries run slower than others and the sleep is a useful tool to artificially slow things down, but now I don't know I don't know if I should interpret that as one command or two. If `WITH sleep AS (SELECT pg_sleep(5))
UPDATE player SET level = 'AA' FROM sleep WHERE team = 'Gophers';` is atomic then I'd expect it to put a lock on the 3 Gophers (which doesn't include Stonebreaker), wait the 5 seconds and then complete the update. The player swap would be blocked for those 5 seconds because it touches a row that's being updated.
The timing of this example is tricky because the two update statements execute concurrently (which is only possible under read committed isolation; under serializable isolation it's much more like what you're describing).
Here's a full timeline in PG (U1 for first update, U2 for second update):
> If `UPDATE player SET level = 'AA' WHERE team = 'Gophers';` is executed before the player swap, then why should "Stonebreaker" be upgraded to "AA"? I'd be pretty mad at my database if I sent those 2 queries in sequence and my DB decided to re-order them.
It’s easy to miss but in the swap query the levels also get swapped. Because — and it’s harder to miss but easy to skip over — given what constraint 2 says the level is actually a team level, not a player level.
So in a seqcst view, either the team’s players get upgraded to AA then the players’ levels get swapped during the exchange, or the players get exchanged then the team’s players get upgraded.
In both sequences you end up with Stonebaker as an AA gopher and Lamport as an A dolphin.
> Then it's bad unnormalized data design that is the problem here.
That the table is not normalised makes the example somewhat confusing but it does not actually affect the issue being demonstrated. And denormalisation is a fact of data modelling.
Why not simply use an example that isn't confusing? Many developers, and especially academics, love wasting effort and time on solving issues they made up but that have no real life examples. When using examples that have trivial alternative solutions, it does not help me as a reader to distinguish whether this is a real problem, or something made up.
Because every example you can think of is confusing, because understanding how concurrent transactions should operate is confusing.
Almost all problems like this can be solved by improving an application data model, but here’s the thing, lots applications have dodgy data models, either due to time constraints, or because the application evolved over time, and the data model didn’t. So these are all real world problems and examples, but creating a “simple” problem to demonstrate the issue almost certainly means also creating an example where other “obvious” solutions exist.
Just because you can’t imagine how this simple example might represent a much more complex “real world” problem, doesn’t mean it doesn’t exist.
> If that is a team level, it should be in the team table, not the player table.
It's all contrived, of course, but the reason I would consider skill level to be a player attribute rather than a team attribute is that there could be free agents with a skill level but no team:
INSERT INTO player VALUES (10, 'Pavlo', 'AAA', NULL);
Then with enough free agents, you could imagine building a new team out of free agents that are all at the same skill level:
UPDATE player SET team = 'Otters' WHERE level = 'AAA' AND team IS NULL ORDER BY id LIMIT 3;
My guess is orgs that have a real need for a distributed SQL database (which are rare in the space of orgs) will make their choices based on requirements analysis rather than naming.
So probably if you would be turned off enough by the name not to use the software, you don't actually need a distributed SQL database and are not the target customer.
The flip side of that is that businesses are more than okay with selling their products to organizations for which it’s overkill.
Gyms know that people that sign up on Jan 2 don’t need annual (or any) membership. They take their money anyway. The more honorable ones allow cancellation or something but I don’t think anyone expects them to actively turn these folks away.
I couldn’t imagine suggesting cockroachDB in a planning meeting for a new product. Especially when over half the room isn’t engineering. Just won’t happen.
PG only uses EvalPlanQual under read committed isolation. Under repeatable read the first update fails with a "could not serialize" error, just as it does under serializable.
How you incorporate SELECT FOR UPDATE locking into Raft? Also maybe slightly off topic, but how does the CockroachDB deals with read-only queries and raft log?
Prior to building read committed, CRDB only supported intents as a form of replicated locks. Intents are provisional values written by active transactions which are resolved once the writing transaction commits or aborts.
As part of building read committed, we introduced other forms of replicated locks. In particular, replicated shared and exclusive locks (SELECT FOR UPDATE statements acquire exclusive locks). Locks acquired by read committed transactions are always replicated.
The unit of replication in CockroachDB is called a range, and every range is its own raft group. Every range has a reserved portion of its keyspace where locks can be stored. This is known as the replicated lock table. Incorporating exclusive/shared locks in the replicated lock table involved adding a notion of lock strength to all locks that were stored (as previously they were all assumed to be intents).
Look, every single time this database is mentioned here, this exact comment comes up. Every. Time. For years, and years, and years. And yet, true to its name, the database and the team building it are still around.
At some point we've got to realize the CRDB team is comfortable with the name, it's not going to change, and all comments lambasting it are just farts in the wind.
I'm relatively ambivalent on the name issue specifically, but I do think this has been hashed over ad nauseum to the point of making for boring reading. This line from the HN guidelines:
> Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead.
I scoffed at first, but I guess cockroach sounds to some what "maggot" sounds like to me.
Like the other person said, it's related to the survivability of cockroaches. One can hope for a management tier competent enough to evaluate an internal database on the merits.
If not for git, you could have to commit to Subversion! And then back up the whole thing with Duplicity.
I'm not a fan of such sense of humor, and prefer neutral monikers like rsync, or Xfce, or Linux, but the ship has sailed.
It just gets normalized and the edginess erased with time and daily use. Imagine that somebody would suggest that you have to wash your hair with some "sham poo". They would be laughed out of the room, had not this word existed for centuries.
You name tools to your management? They won’t know what anything is anyway? The word database (some sort of Excel!) is usually an issue already. Why would you go into that much detail; just don’t?
Ha! Do you think this is uncommon? The path to becoming a manager in many organizations is biased towards those who are good at politics, loyal, a friend, and charismatic—all sorts of things that don't ensure effective approaches to technology selection.
Idk to me this looks like a modeling issue of the data. There should be a team table that contains team specific data such as the skill level, then these two queries wouldn't run into any problems.
There were many interesting design decisions, such as:
- whether to use multiple snapshots or a single snapshot per statement
- how to handle read uncertainty intervals
- how to incorporate SELECT FOR UPDATE locking into Raft
- how to handle SELECT FOR UPDATE subqueries
- how to prevent lost update anomalies between two UPDATEs
Some of the gory details are in the public RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RF...
This blog post just discusses the last point, but please AMA.