Hacker News new | past | comments | ask | show | jobs | submit login

I'm not a data guy, can you explain this part a bit more if you have time?

> you have realtime data ingestion, you will also have to build tooling to re-ingest older data that has changed or needed to be amended. This will end up looking like a 'lambda architecture'




Lambda architecture for data processing, as popularized by Nathan Marz et al [0], has two components, the Batch layer and the Stream layer. At a high level, Batch trades quality for staleness whilst Stream optimises for freshness at the expense of quality [1].

I believe what GP means by Lambda is that, you'd need a system that batch processes the data to be amended / changed (reprocess older data) but stream processes whatever that's required for real-time [2].

An alternative is the Kappa architecture proposed initially by Jay Kreps [3][4], co-creator of Apache Kafka.

---

[0] https://www.amazon.com/dp/1617290343

[1] https://en.wikipedia.org/wiki/Lambda_architecture

[2] https://speakerdeck.com/druidio/real-time-analytics-with-ope...

[3] https://engineering.linkedin.com/distributed-systems/log-wha...

[4] https://dataintensive.net/


The sources are good and thorough, but very long. Here’s an ok summary of kappa proposal: https://milinda.pathirage.org/kappa-architecture.com/

In theory this sounds great, but you have to account for processing capacity.

While compute is getting cheaper, one of the key reasons streaming in lambda sacrifices quality over throughput is compute capacity (as well as timing). If you have to feed already stored data through the same streaming pipe, you either have to have a lot of excess capacity, be willing to pay for that additional burst or accept latency in your results (assuming you can keep up with your incoming workload and not lose data). There is no free lunch.


Here's a related article: https://medium.com/open-factory/state-of-the-m-art-big-data-...

An excerpt from the article:

Furthermore, the big data tools can be combined using a growing number of data processing architectures — Lambda and Kappa, among others.


Thanks so much for the comment, it was very helpful!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: