Finally. I assume there must be good reasons beyond "that's what Hadoop has alwa...

nemothekid · on March 31, 2021

Before Raft (2013), if you wanted reliable, consistent distributed metadata store you had to implement Paxos which is notoriously difficult to get right. Every service that needed some type of leader election or highly consistent store let Zookeeper deal with that problem (Mesos, Spark, Druid, Storm, and a ton others).

After Raft, it became easier to just implement that layer yourself and so most projects after Raft (or probably more accurately once people started seeing how stable etcd was, ~2014), just used Raft internally where they would have previously used zookeeper.

dikei · on March 31, 2021

To be fair, many project's Raft implementations contained errors that can and had lead to data lost, so it's not all sunshine and roses.

IMHO, it's still easier to delegate the consensus problem to a third party service like Zookeeper or ETCD.

nvarsj · on March 31, 2021

Raft is a large improvement over Paxos for practical implementations. But it's still tricky to get right. As far as I know, the only widely used, battle tested Raft implementation is github.com/hashicorp/raft. Which is why so many distributed systems are being built on golang over the last few years. I don't know if there is any Java raft implementation which has reached that level of maturity yet - but it seems like Confluence is trying with kraft.

mavelikara · on March 31, 2021

What are the notable projects that implement Raft internally for leader election?

Also, do any of those projects publish their Raft implementation as a library for other projects to include?

beberlei · on March 31, 2021

Anything from hashicorp, vault, consul, nomad for example. yes there is a go library for the basic raft setup afaik

anonymousDan · on March 31, 2021

Because historically implementing something like Zookeeper yourself from scratch is notoriously difficult?

doliveira · on March 31, 2021

I guess what I wonder is why they didn't go with an embedded library or something of sorts. Some NoSQL databases handle it without Zookeeper.

nemothekid · on March 31, 2021

>Some NoSQL databases handle it without Zookeeper.

Most NoSQL databases, now, use Raft, which didn't exist at the time when Kafka was created. Other NoSQL databases, at the time, were not as stable as Zookeeper or had silent bugs that ate data (see aphyr's Jepsen series[1], which thourghly tested several NoSQL databases and found many to be failing, except for Zookeeper).

[1] https://aphyr.com/tags/jepsen

tammerk · on March 31, 2021

https://github.com/jepsen-io/jepsen/issues/399

> Yeah! I mean, I find a lot of linearizability errors in various databases, but this was also my very first time doing this kind of test, and it varies from system to system. Could have easily slipped through the cracks.

In summary, aphyr thought Zookeeper is linearizable even though it doesn't provide linearizable ops.

Looks like Zookeeper needs to be tested again.

dbt00 · on March 30, 2021

I wasn't there when they made the call, but "we know it works" seems like it was the key element here.