Ah that makes sense! Looking at their results and how RedShift was so much faste...

monstrado · on Nov 7, 2013

Impala isn't technically Cloudera only, it's open source (https://github.com/cloudera/impala), and other people have gotten it to run on their Hadoop distribution, but since it's developed by Cloudera, it was developed to run on the CDH platform (Hadoop).

Parquet was a joint effort between Cloudera and Twitter, and now it's being developed by many other companies. You can use it with Hive, Pig, MapReduce, Cascading, Crunch and I think Apache Drill's first milestone has adopted it as a columnar format as well. Parquet also allows you to use your Avro or Thrift schema (soon Protobuffs) to write Parquet data, too.

It's a separate project in the ecosystem and has its own roadmap (https://github.com/Parquet/parquet-mr).