In Postgres, I never have to worry that the server will be accidentally loaded w...

lima · on Sept 6, 2017

Kafka is a fine replacement for a RDBMS if it fits your particular use case. It has very strong data consistency guarantees - stronger than most RDBMS - if you configure it properly (acks=1 et al). It won't even lose data if the leader of a partition crashes during a commit.

It has been explicitly designed for these use cases and even has features like compaction:

https://kafka.apache.org/documentation/#compaction

Now, I agree with you that in most cases, using Kafka as your primary data store instead of a RDBMS is madness - but that doesn't mean it's a bad idea in general.

dreamfactored · on Sept 6, 2017

Isn't that what 'in general' means?

tlberglund · on Sept 7, 2017

I don't mean to be cavalier about misconfiguration, but it's not like the retention period is the Sword of Damocles. It's a configuration setting, and Kafka honors it reliably. As other commenters have pointed out, there are other bad things you can do to cause data loss with any system no matter how hard you try. These stories will continue to grace post-mortem blog posts long after we are gone, but stories of Kafka accidentally not retaining data do not seem to be thick on the ground. Any non-trivial system has its rough edges, but this just doesn't seem to be one of them for Kafka.

skybrian · on Sept 6, 2017

So, remove the flag that you'll never use and recompile?

I'm not sure this is worse than using a Unix box with a working "rm" command.

cookiecaper · on Sept 6, 2017

For what it's worth, I've known sysadmins who strip their boxes to the bones and take pains to ensure that the "rm" command won't be able to be accidentally invoked, primarily by ensuring it doesn't exist on the box. They carry their utilities from box to box, and take them with them when they leave.

That said, any slightly-sane permission or access control scheme, including the defaults mandated by almost all RDBMS distributions (which want a system user dedicated to their use), would make it rather difficult to rm the database folder. Just opening a shell to a RDBMS's underlying server should be a rare event in itself, to say nothing of actually elevating to root, or running a sloppy/careless rm command that is uncaught by the numerous potential failsafes that sysadmins have been installing for decades now (constraining superuser access to a pre-defined set of commands, for example).

Again, the point is not that RDBMS systems are invincible. It's just that they're much sturdier, and actually designed to serve this purpose.

In what universe is "Well, hack out the dangerous parts" a reasonable answer? Talk about reckless disregard for data integrity! Do you really want to use Kafka that bad that you'd develop, maintain, and thoroughly test a custom patchset that circumvents its eviction routines, rather than just using the systems that already excel at not deleting stuff?

Secondly, eviction is a core part of a message queue's design, on purpose. It's actually a needed thing, and while I'm not a Kafka dev, I seriously doubt that it's so simple that a single flag can be disabled and we can move on.

skybrian · on Sept 6, 2017

Disabling a flag is likely a one-line change, assuming a reasonable flag library. But yes, maintaining a custom fork at all is not something to take on lightly. It would make more sense to talk to the Kafka developers about how to make it safer to use in this scenario.