Hacker News new | past | comments | ask | show | jobs | submit login

Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

A Parquet file is a static file that has the whole data associated with a table. You can't insert, update, delete, etc. It's just it. It works ok if you have small tables, but it becomes unwieldy if you need to do whole-table replacements each time your data changes.

Apache Iceberg fixes this problem by adding a metadata layer on top of smaller Parquet files (at a 300,000 ft overview).




I knot you’re not OP, but and while this explanation is good, it doesn’t make sense to frame all this as a “problem” for parquet. It’s just a file format, it isn’t intended to have this sort of scope.


The problem is that the "parquet is beautiful" is extended all the time to pointless things - pq doesn't support appending updates so let's merge thousands of files together to simulate a real table - totally good and fine.


Well… when Parquet came out, it was the first necessary evolutionary step required to solve the lack of the metadata problem in CSV extracts.

So, it is CSV++ so to speak, or CSV + metadata + compact data storage in a singular file, but not a database table gone astray to wander the world on its own as a file.


> Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

Delta format also supports this, correct?


Correct. They have feature parity, basically.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: