Hacker News new | past | comments | ask | show | jobs | submit login

Parquet is not HDFS. It is a static format, not a B-tree in disguise like HDFS.

You can have compressed Parquet columns with 8192 entries being a couple of tens bytes in size. 600 columns in a row group is then 12K bytes or so, leading us to 100GB file, not a petabyte. Four orders of magnitude of difference between your assessment and mine.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: