Hacker News new | past | comments | ask | show | jobs | submit login

I think that you description of exactly once messaging is inaccurate

> This is the holy grail of messaging, and also the fountain of a lot of snake-oil.

Exactly once message delivery is quite possible with messaging systems that support transactions. When combined with other transactional resources (e.g. database) and a distributed transaction monitor, exactly once messaging works well and is rock solid reliable. The grand-daddy of message brokers IBM MQ is absolutely capable of exactly once messaging.




AWS makes the same claim with FIFO SQS, and maybe I’m getting it wrong, but these claims 1) have a lot of caveats and 2) only work inside the boundaries of the messaging system.

There’s a note in the next paragraph about how systems manage to say that if you pass in the same message ID / token for X minutes they won’t be duplicated, and my ensuring FIFO there’s a side effect of not giving out the next message until the current one is acknowledged.

This leads to a situation where there’s a guarantee of exactly once acknowledgement, but not necessarily exactly-once processing or delivery. Given that the semantics of at-most-once and at-least-once apply to processing and delivery, I personally don’t think the goalposts should move on exactly once.

Systems claiming exactly-once lull developers into not planning for multiple deliveries on the subscriber, or the need to do multiple publishes, both of which can still happen.


It's better to use a SQS standard queue and have the consuming system provide the exactly-once processing guarantee for various reasons. You will need to introduce something like Redis, if you are not already using it, but I still think it's net superior to using an SQS FIFO queue if you want exactly-once processing.


Might not even need to use Redis. If the message has a proper idempotent ID a transactional database is more than enough. If the consumer is running MySQL/Postgres/DynamoDB etc nothing else is needed.


Not quite, there are always a bunch of edge cases that inevitably make "exactly once" actually "almost always exactly once, but always at least once."


Indeed. "exactly once" violates the CAP theorem, so if you actually make a system that can guarantee "exactly once" then you should apply for a Turing medal immediately.


I think that you are misunderstanding the CAP theorem. The CAP theorem states that in the event of a network partition, a system can either be consistent or available, but not both. So a messaging system that provided exactly once message delivery would not provide availability during a network partition. However, there are many applications for which consistency is more important than availability, especially if the period of unavailability can be limited.


Ah, but "especially if the period of unavailability can be limited" is exactly the type of edge case kasey_junk was talking about. Network partitions may persist for unbounded amounts of time as far as the CAP theorem is concerned, and an unspecified amount of packets may be dropped and/or delayed. It could be the case that every message you send gets dropped due to a persisting partition and in such a case none would arrive, thereby violating the "guarantee" of exactly-once delivery.

In practice I agree that these problems are quite rare since most network are reasonably stable. However, especially at scale it's not rare to see messages dropped or delivered more than once. I have no doubt IBM MQ can achieve exactly-once most of the time, but no distributed system can achieve exactly-once delivery all of the time.


> It could be the case that every message you send gets dropped due to a persisting partition and in such a case none would arrive, thereby violating the "guarantee" of exactly-once delivery.

That is not correct. All interactions between the client and the broker are performed in transactional units. If the transaction in which messages are sent fails to commit, then the messages are not sent, and all work is rolled back. Once a message is successfully send (that is, sent and transaction committed), it will be delivered once and only once to the receiver.

Likewise on the receiving side, a message is delivered and the encompassing transaction is committed once and only once. A message may be delivered more than once if the encompassing transaction is later rolled back due to say network failure. But a message delivery in a transaction that does not commit is not a delivery.

The benefit here is that application programmers don't need to concern themselves with message duplicate checking and the risk that duplicate checking is done incorrectly leading to bugs that are very difficult to identify.


A transaction which is partition-tolerant in the way you're describing requires stronger semantics than mere client acknowledgement, it requires all participants to engage in the consensus protocol. Unless your application joins the message broker's topology as an active member -- some systems do work this way, like Zookeeper -- it can still suffer message loss.

But even if it does join, that's still not sufficient, because these systems can become unavailable during partitions, and that is definitionally incompatible with "exactly once".


It’s been an age since I’ve worked with IBM MQ and there are dials upon dials when setting up MQ based systems but it doesn’t off exactly once in the face of broker failure in most of its HA configurations and it uses deduplication at the protocol level to prevent duplicates.

When people say “exactly once” is impossible they really mean in the face of failure at the queue level.


> When people say “exactly once” is impossible they really mean in the face of failure at the queue level.

And what exactly is impossible with that? Just wait it out, i.e. like all the CP systems do (as per CAP).


The premise is that unavailability is the same as zero delivered messages, not one.

Note none of this is rigorously defined either in the article or with most message queues and the configuration of queues/brokers/clients means that there are all manner of edge cases around delivery guarantees in practice.


"Wait it out" is only a valid strategy when message rates are low enough that you can buffer them all until the network partition goes away again.

As an example, imagine a system sending a million 1 KB messages per second. To survive a 1 minute network outage it would need 60 GB extra storage to park the messages. If the outage lasts longer than it has space available, dropping messages becomes inevitable.


Even in the face of network partitions?


See my answer to WJW above. Yes, even in the face of network partition, but with system unavailability in the event of a network partition.


System unavailability means the messages get delivered zero times, which is not exactly once.


So we're just playing semantic games at this point, using different definitions of terms. The definition of "exactly once" you're using isn't the formal definition.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: