Hacker News new | past | comments | ask | show | jobs | submit login

We ensure the CRDT is synced with at least two nodes in different geographical areas before returning an OK status to a write operation. We are using CRDTs not so much for their asynchronous replication properties (what is usually touted as eventual consistency), but more as a way to avoid conflicts between concurrent operations so that we don't need a consensus algorithm like Raft. By combining this with the quorum system (two writes out of three need to be successfull before returning ok), we ensure durability of written data but without having to pay the synchronization penalty of Raft.



> We ensure the CRDT is synced with at least two nodes in different geographical areas before returning an OK status to a write operation [...] we ensure durability of written data but without having to pay the synchronization penalty of Raft.

This is, in essence, strongly-consistent replication; in the sense that you wait for a majority of writes before answering a request: So you're still paying the latency cost of a round trip with a least another node on each write. How is this any better than a Raft cluster with the same behavior? (N/2+1 write consistency)


Raft consensus apparently needs more round-trips than that (maybe two round-trips to another node per write?), as evidenced by this benchmark we made against Minio:

https://garagehq.deuxfleurs.fr/documentation/design/benchmar...

Yes we do round-trips to other nodes, but we do much fewer of them to ensure the same level of consistency.

This is to be expected from a distributed system's theory perspective, as consensus (or total order) is a much harder problem to solve than what we are doing.

We haven't (yet) gone into dissecating the Raft protocol or Minio's implementation to figure out why exactly it is much slower, but the benchmark I linked above is already strong enough evidence for us.


I think it would be great if you could make a Github repo that is just about summarising performance characteristics and roundrip types of different storage systems.

You would invite Minio/Ceph/SeaweedFS/etc. authors to make pull requests in there to get their numbers right and explanations added.

This way, you could learn a lot about how other systems work, and users would have an easier time choosing the right system for their problems.

Currently, one only gets detailed comparisons from HN discussions, which arguably aren't a great place for reference and easily get outdated.


RAFT needs a lot more round trips that that, it needs to send a message about the transaction, the nodes need to confirm that they received it, then the leader needs to send back that it was committed (no response required). This is largely implementation specific (etcd does more round trips than that IIRC), but that's the bare minimum.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: