Apache Iceberg builds an additional layer on top of Parquet files that let's you...

pgwhalen · 2025-03-18T23:03:36 1742339016

I knot you’re not OP, but and while this explanation is good, it doesn’t make sense to frame all this as a “problem” for parquet. It’s just a file format, it isn’t intended to have this sort of scope.

hobs · 2025-03-19T04:46:27 1742359587

The problem is that the "parquet is beautiful" is extended all the time to pointless things - pq doesn't support appending updates so let's merge thousands of files together to simulate a real table - totally good and fine.

inkyoto · 2025-03-19T00:00:37 1742342437

Well… when Parquet came out, it was the first necessary evolutionary step required to solve the lack of the metadata problem in CSV extracts.

So, it is CSV++ so to speak, or CSV + metadata + compact data storage in a singular file, but not a database table gone astray to wander the world on its own as a file.

victor106 · 2025-03-19T04:30:12 1742358612

> Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

Delta format also supports this, correct?

orthoxerox · 2025-03-19T06:38:11 1742366291

Correct. They have feature parity, basically.