What the author is trying to say is that (IMHO) running a database on a MPP (qui...

eclark · on May 16, 2013

You can add Impala to HBase and get a pretty good SQL based low latency analytics solution. (If your data is structured to allow impala to take advantage of row key ordering)

* Disclosure: I'm an Apache HBase committer, I've written parts of Impala's HBase integration, and I work at Cloudera.

nemothekid · on May 17, 2013

Apples to Oranges, how so, if you don't mind me asking? What about HBase makes it different from RedShift/Greenplum/Oracle?

jackowayed · on May 17, 2013

HBase isn't built for analysis workloads. It doesn't have a complex query engine, so you end up having to do massive scans (which aren't especially fast), transfer a ton of data to the node where your analytics code is running, and do the computation there. If things are too big to run on one machine, that's your problem, not HBase's.

On the other hand, Impala, RedShift, Oracle Exadata, etc. let you ask the database to do work at a much higher level, which allows for much better performance because the data storage and computation layers can work in tandem (so you can prune down to only the data your query needs at each storage node before hitting the network, for example), and the database does the work of optimizing for multiple cores and nodes, not the writer of the analysis routine.