Hacker News new | past | comments | ask | show | jobs | submit login

> offload old data into S3 and then retrieve it if required

I remember asking for it 5 years ago: https://radek-gruchalski.medium.com/the-case-for-kafka-cold-.... Confluent turned it into a paid feature.




Yep, it's available from Confluent if you pay for it, but like how Mirror Maker 2 is awfully similar to Confluent Replicator, I believe that Kafka will (eventually) get tiered storage under the Apache licence (I know there's a KIP for it[1]). It's a hard issue to solve, and not sure how much effort in the community is being directed towards it. But bear in mind that it's not just Confluent who have a stake in Kafka - there's a bunch of big corps selling managed/supported Kafka and all of them would probably quite like tiered storage in core Kafka as a feature to help them sell their support/management, so I have some faith in their enlightened self-interest.

The fact that MM2 happened, and Confluent didn't try to stop it, despite it being awfully similar to Replicator, makes me think that Confluent are acting in good faith.

Incidentally, I quite like how Pulsar solved tiered storage, and it's a definite tick in the Pulsar box - it's transparent from a consumer's POV, although there somewhat of a delay in rehydrating the offloaded block, I don't think anyone's expecting near-realtime performance when loading historical data.

[1]: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A...


Thanks for putting things in perspective, EdwardDiego.

> The fact that MM2 happened, and Confluent didn't try to stop it, despite it being awfully similar to Replicator, makes me think that Confluent are acting in good faith.

Let me share an anecdote related to this example. We (Confluent) were actually the ones who contributed the documentation for MirrorMaker v2 to the Apache Kafka docs (https://kafka.apache.org/documentation/#georeplication). The development lead on MM2 was (an engineer at) Cloudera, yet they never spent the time to provide user-facing documentation to the Kafka project. I don't want to speculate about reasons, yet I noticed that MM2 was documented in the Cloudera docs.

If we didn't care for the Kafka community at Confluent, we would not have spent our own resources and time to fill that gap, given that we have a proprietary product similar to MM2 (i.e., Confluent Replicator).

https://github.com/apache/kafka/pull/9983


Shit, wait, there's documentaton for Mirror Maker 2 now? I spent most of my time implementing it by reading hypothetical examples in a KIP, and then diving into the actual code.

Hardly the most straightforward, and it was rather a gaping hole. Thanks for the background on how that hole developed.

I really appreciate Confluent putting that time into documenting something vital, that could compete with your own product, and IMO that does put a nail in the previous commenter's assertions about Confluent's alleged attempts to wall off necessary features of Kafka.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: