Not all data warehouses are designed for OLAP. As the performance of these systems increase, it is possible to satisfy OLAP use cases with some warehouses such a BQ, but they are still not ideal as your number of concurrent queries scale and your response time requirements remain constant. The BQ team certainly doesn't consider their product an OLAP solution, neither the RedShift team. AWS is actually working on their own OLAP solution, Timestream; that is how they pitch it, as their OLAP solution.
BigQuery's petabyte aggregations in seconds is a false claim. I just launched a query to count the number of records of a 1.4 PB dataset and it has taken over a minute to just schedule the query, I have 1000+ compute units available too. I have been able to perform a similar query in seconds on top of Druid; the cluster had a run rate of $25k/month at the time, while I believe we pay 40k+/month just on our BQ deal. Maybe under ideal conditions BigQuery can run such a query, but I have never seen it do such things.
I have experience with Snowflake, it is great if you are in AWS, but I wouldn't choose it over BigQuery. I feel that Snowflake's strength is in their data sharing protocol, which allows us to tap into the datasets of partners without major data transfer costs. We use it for that reason only.
You're using a strange definition of OLAP here when it's really not tied to performance or concurrency. It only means you can send an ad-hoc query and get back results directly in response, rather than waiting on pre-computed or batched processing happening somewhere else.
Redshift and BigQuery are both OLAP products and their documentation and whitepapers state the same. The performance profiles of both are very different but that's due to the architecture. Redshift is an always running cluster of EC2 instances running their forked scale-out Postgres. BigQuery is an entirely different engine based on Dremel which focuses on parallelization as much as possible along with Google's proprietary datacenters that have petabits of internal bandwidth to support it. AWS Timestream is an entirely different product for time-series specific data, and basically
The only reason Druid is faster is because it pre-aggregates and indexes data. You can aggregate tables yourself in Redshift and use the right primary keys to get most of the same performance. BigQuery just launched materialized views and is working on scheduling improvements and in-memory caching as well. Snowflake now runs in Azure and GCP and can do sub-second responses when the warehouse is running and data is cached or the tables are smaller.
I wasn’t able to achieve the same performance with RedShift aggregates, I tried that first before I decided to migrate from RedShift to Druid back in 2014. We deal with dozens of dimensions per event and no combination of distribution keys in Redshift was able to give up the same performance over arbitrary scans+ aggregations.
Druid is not only fast because it pre-aggregates, but the memory structure is designed for scans over hundreds of dimensions.
Materializing views in BigQuery is just one DAG task. Unless you don’t have something like Airflow on your stack, I don’t see how it is worth mentioning. We are talking about denormalized data, time series data.
I am speaking from experience with each one of these products. Perhaps I did it all wrong, but we certainly achieved the objectives we were after.
BigQuery's petabyte aggregations in seconds is a false claim. I just launched a query to count the number of records of a 1.4 PB dataset and it has taken over a minute to just schedule the query, I have 1000+ compute units available too. I have been able to perform a similar query in seconds on top of Druid; the cluster had a run rate of $25k/month at the time, while I believe we pay 40k+/month just on our BQ deal. Maybe under ideal conditions BigQuery can run such a query, but I have never seen it do such things.
I have experience with Snowflake, it is great if you are in AWS, but I wouldn't choose it over BigQuery. I feel that Snowflake's strength is in their data sharing protocol, which allows us to tap into the datasets of partners without major data transfer costs. We use it for that reason only.