The analogy helped me understand, but in factorio belts are required primarily b...

pdpi · on July 5, 2019

Push architectures put all of the control on the producer, but become quite painful in the face of an unreliable consumer. Pull architectures put all of the control on the consumer, but become quite painful in the face of an unreliable producer.

Message queues allow the producer to think it's participating in a push architecture, while the consumer believes it's a pull-based arch. Because the message queue itself does very little work, it gets to play a very well-behaved consumer insofar as the producer is concerned, while behaving like a well-behaved producer for the consumer. The queue's buffering then allows both the producer and the consumer to misbehave to some degree while the queue itself keeps the illusion of good behaviour.

Kafka specifically falls under the category of log-oriented message queues, which are eminently useful for distributing any workload that looks like "tail a log and process each line as it comes in" across a large number of nodes.

pul · on July 5, 2019

This. With the slight nuance that it works better if you don't think of it as queues, since that suggests messages being removed when processed.

The big win when introducing Kafka is that (apart from the schema) it completely decouples producers and consumers of messages. That in turn reduces the required coordination between independent teams within an organisation.

commandlinefan · on July 6, 2019

It’s a good theory, and definitely something you should be aware of but... in my experience if the producer fails, you get nothing (of course), and if the consumer fails, the messages just back up until the queue overflows and of course if the queue itself fails - which can happen due to hardware or network as well as software issues - you have a new point of failure that’s unrelated to the producers or consumers. So the whole thing is only as good as the responsiveness of your monitoring team... which is effectively the case with synchronous message delivery, only with the added complexity of troubleshooting the queue in the middle.

penagwin · on July 5, 2019

> What are real world use cases for kafka?

He covers it under the introduction "Why bother with async messaging?".

Basically in the analogy the belts are queues. Factorio doesn't require queues, but it quickly leads to problems with resource management, as you essentially have to lock the system until your item is done processing.

opportune · on July 5, 2019

Let's say you have a few tens of millions of people using your webapp and you are collecting analytics/usage data from them. You definitely don't want them all trying to authenticate + write directly to a db; you need an intermediate layer to process all the data coming from tons of different sources.

You could potentially do that with a separate microservice you communicate with via http, but this requires a "liveness" of the microservice that isn't really necessary; you will often lose events if the microservice isn't able to keep up with the incoming load, and you need to process the events just as fast as they come in. The data flow is really just unidirectional so the response is unnecessary, you just need to reliably transmit the data.

Kafka lets you handle unidirectional data flows in a way that is lazier. The data producers just write to a service and the consumers connect to the service. In between, Kafka just behaves like a distributed message queue. Obviously this is a huge benefit over directly writing to a db or any other kind of offline storage since it can greatly reduce the connection overhead. The main benefit over using a microservice is that it relaxes the constraint that all the data is processed/handled exactly as it comes in. It makes non-critical data flow more redundant by adding this queue

(I don't think the linked article does a great job explaining why you would use kafka / what the alternatives are)

barrkel · on July 5, 2019

A database is a huge API to share with another team, and queries from other people add load with a principle-agent problem [1]

Give people a firehouse of events or deltas, though, and if they want to query it, performance is now their problem - they build the database, update it, index it, etc.

This is part of scaling organisationally, not just removing the database as a bottleneck.

Events are also a route to making cache invalidation - one of the hardest problems in CS, as we know - tractable. Build your caches as services that consume the firehouse, and invalidate based on a join between events and the cached data.

[1] https://en.m.wikipedia.org/wiki/Principal%E2%80%93agent_prob...