Hacker News new | past | comments | ask | show | jobs | submit login

Distributed SQL data warehouses? Start with the papers:

- C-Store (column-oriented database storage and processing): http://db.lcs.mit.edu/projects/cstore/vldb.pdf and https://sookocheff.com/post/databases/c-store/

- BigQuery (the original Dremel-based query engine): https://cloud.google.com/files/BigQueryTechnicalWP.pdf

- Snowflake paper: https://www.snowflake.com/resource/sigmod-2016-paper-snowfla...

- Andy Pavlo's CMU distributed OLAP databases lecture: https://www.youtube.com/watch?v=dPMc7FZ3Gqo&t=1s

- MemSQL architecture: https://www.youtube.com/watch?v=HoompXCG5Mk

Start there and just read more about architecture of the various products. You'll learn the basics of distributed nodes that store data in column-oriented tables partitioned by a key and how to run queries over them using various performance techniques. Then you can dive into the unique differences of the various databases.




No sorry, I didn't mean where to read about how these things are designed (though these are useful and interesting links for that too, thanks!). What I'm asking about is what to follow (blogs, conferences, etc.) to keep up with new developments in the space: new products, experiences with them, reviews of their trade offs in practical use, that kind of thing. But I do appreciate the response!


HN is a great resource with plenty of important news posted, along with the HighScalability blog [1] and SoftwareEngineeringDaily podcast [2].

1. http://highscalability.com/

2. https://softwareengineeringdaily.com/

Other than that, reading the blogs of the various vendors is how I keep up (using Feedly with RSS). The modern projects are Redshift, BigQuery, Snowflake, Azure DW, MemSQL, Clickhouse, YellowBrick with older projects being Vertica, Teradata, Greenplum. It's also useful to follow the "new" distributed SQL projects like CockroachDB, Citus, TiDB, Vitess, Yugabyte.


Thank you this is super useful to me!



An architecture of having both row-based database and column-based data might also be interesting to explore. This post explains how they work together. https://pingcap.com/blog/delivering-real-time-analytics-and-...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: