Hacker News new | past | comments | ask | show | jobs | submit login

If you use Kafka in an event sourced fashion, whereby each service emits events that it puts on a bunch of Kafka topics, and each service that needs that information consumes those events (by reading and applying those events to its local understanding of the world), I argue that you have effectively made a massive distributed and shared database - shared between all the services.

This goes smack in the face of the idea of "one service, one database" 1), where there is a distinct boundary where each service owns their own data. How a given service stores it data is of no concerns to any other service - the communication between them is done using e.g. REST or messages, with a clear contract.

Event sourcing / Kafka architectures are the exact opposite of a clear contract. You are exposing the absolute, deep-down innards of the storage solution of a service, by putting its microscopic events down for all to see, and all to consume. You may do aggregations, and emitting more coarse-grained "events" or state changes, thus kinda also exposing "views" of these inner tables, and maybe use a naming scheme to sort of which are "public" and which are not.

In the beginning, I really did find the concept of event sourcing to be extremely intriguing: Both the "you can get back to any point in history by replaying up-to", and "forget the databases, just emit your state changes as events" (I truly "hate" databases, as they are the embodiment of state, which is the single one thing that makes my field hard!), the ability to throw up a new projection for anything you'd need (a new "internal view") of the state, and that unit testing could be really smooth.

I upon deeper delving into the concepts found that the totality of such a system must quite quickly become extremely heavy: The amount of events, thus needing snapshots. Evolution of events, where you might have no idea of who consumes them (that is, the "shared database" problem). The necessary understanding, throwing a half-century or more of relational databases under the bus. The performance tweaking if things start to lag. Etc etc etc. It would become a distributed system in the worst way a distributed system could be distributed: All state laid out in minute details, little abstractions, and a massive diverse set of different implementations in the different services to get back to a relational view of the data. And this is even before the system gets a decade old, with lots of different developers coming and going, all of them needing to get up to speed, and the total system needing extremely good and hard steering to not devolve into absolute utter chaos.

That Kafka can be employed and viewed as a database has been argued hard by Confluent. Here's their former DevEx leader Tim Berglund explaining how databases are like onions, and that in the base of every database there is a commit log. And that this log is equivalent to a set of Kafka topics. 2) So why not just implement all the other layers of the database in application logic? Confluent even have a lot of libraries and whatnots to enable such architectures.

1) "Microservice Architectures", patterns: https://microservices.io/patterns/data/database-per-service....

2) JavaZone Talk from 2018 by Tim@Confluent: https://2018.javazone.no/program/3a9644e6-15b5-4c66-a28c-c35...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: