Hacker News new | past | comments | ask | show | jobs | submit login

Finally. I assume there must be good reasons beyond "that's what Hadoop has always used" but philosophically, I never understood why introduce yet another network dependency to handle elections. It really adds up to the operational complexity, from having to manage the Zookeeper cluster to having to fight against DNS.



Before Raft (2013), if you wanted reliable, consistent distributed metadata store you had to implement Paxos which is notoriously difficult to get right. Every service that needed some type of leader election or highly consistent store let Zookeeper deal with that problem (Mesos, Spark, Druid, Storm, and a ton others).

After Raft, it became easier to just implement that layer yourself and so most projects after Raft (or probably more accurately once people started seeing how stable etcd was, ~2014), just used Raft internally where they would have previously used zookeeper.


To be fair, many project's Raft implementations contained errors that can and had lead to data lost, so it's not all sunshine and roses.

IMHO, it's still easier to delegate the consensus problem to a third party service like Zookeeper or ETCD.


Raft is a large improvement over Paxos for practical implementations. But it's still tricky to get right. As far as I know, the only widely used, battle tested Raft implementation is github.com/hashicorp/raft. Which is why so many distributed systems are being built on golang over the last few years. I don't know if there is any Java raft implementation which has reached that level of maturity yet - but it seems like Confluence is trying with kraft.


What are the notable projects that implement Raft internally for leader election?

Also, do any of those projects publish their Raft implementation as a library for other projects to include?


Anything from hashicorp, vault, consul, nomad for example. yes there is a go library for the basic raft setup afaik


Because historically implementing something like Zookeeper yourself from scratch is notoriously difficult?


I guess what I wonder is why they didn't go with an embedded library or something of sorts. Some NoSQL databases handle it without Zookeeper.


>Some NoSQL databases handle it without Zookeeper.

Most NoSQL databases, now, use Raft, which didn't exist at the time when Kafka was created. Other NoSQL databases, at the time, were not as stable as Zookeeper or had silent bugs that ate data (see aphyr's Jepsen series[1], which thourghly tested several NoSQL databases and found many to be failing, except for Zookeeper).

[1] https://aphyr.com/tags/jepsen


https://github.com/jepsen-io/jepsen/issues/399

> Yeah! I mean, I find a lot of linearizability errors in various databases, but this was also my very first time doing this kind of test, and it varies from system to system. Could have easily slipped through the cracks.

In summary, aphyr thought Zookeeper is linearizable even though it doesn't provide linearizable ops.

Looks like Zookeeper needs to be tested again.


I wasn't there when they made the call, but "we know it works" seems like it was the key element here.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: