> Okay, I can see that point, but is it worth the additional latency between broker and Bookie?
It might depend on what you're ingesting and how much. Being able to independently scale ingest and storage is a good alternative to have. It's not only ingest though. It's also consumption. As it stands, having a parallel consumer over a large partition spanning several GBs also requires tons of RAM because a segment must be loaded into memory. In that sense, reprocessing historical data is pretty difficult. There's a lot of complexity hidden in additional Druid, HDFS installations or shoe-horned object storage with their own indexing to support access to historical data semi-fast.
> having a parallel consumer over a large partition spanning several GBs also requires tons of RAM because a segment must be loaded into memory.
I don't know too much about Kafka's internals, but that's my not experience of reading several terabytes of data from a Kafka topic. Memory didn't blow out, although we did burn through IOPs credits.
Good remark. When there's one consumer or multiple consumers hanging on roughly to the same offset (using the same memory mapped segment). Having many consumers hanging on widely different offsets will cause many segments sitting in RAM.
Edit: Kafka apparently does not store a complete log segment in memory, only parts but having many consumers may lead to a lot of churn or a lot of memory consumed. Maybe this is getting better.
It might depend on what you're ingesting and how much. Being able to independently scale ingest and storage is a good alternative to have. It's not only ingest though. It's also consumption. As it stands, having a parallel consumer over a large partition spanning several GBs also requires tons of RAM because a segment must be loaded into memory. In that sense, reprocessing historical data is pretty difficult. There's a lot of complexity hidden in additional Druid, HDFS installations or shoe-horned object storage with their own indexing to support access to historical data semi-fast.