> Almost all big data storage solutions are NoSQL.
I think it's important to distinguish between OLAP AND OLTP.
For OLAP use cases (which is what this post is mostly about) it's almost 100% SQL.
The biggest players being Databricks, Snowflake and BigQuery. Other tools may include AWS's tools (Glue, Athena), Trino, ClickHouse, etc.
I bet there's a <1% market for "NoSQL" tools such as MongoDB's "Atlas Data Lake" and probably a bunch of MapReduce jobs still being used in production, but these are the exception, not the rule.
For OLTP "big data", I'm assuming we're talking about "scale-out" distributed databases which are either SQL (e.g. cockroachdb, vitess, etc) SQL-like (Casandra's CQL, Elasticsearch's non-ANSI SQL, Influx' InfluxQL) or a purpose-built language/API (Redis, MongoDB).
I wouldn't say OLTP is "almost all" NoSQL, but definitely a larger proportion compared to OLAP.
> Almost all big data storage solutions are NoSQL.
Most I've seen aren't. NoSQL means non-relational database. Most big data solutions I've seen will not use a database at all. An example is hadoop.
Once you have a database, SQL makes a lot of sense. There are big data SQL solutions, mostly in the form of columnar read-optimized databases.
On the above, a little bit of relational can make a huge performance difference, in the form of, for example, a big table with compact data with indexes into small data tables. That can be algorithmically a lot more performant than the same thing without relations.
Anyone reading the above comment ask yourself. Do you actually think when interest rates dropped to zero they suddenly inverted a system that was better than SQL. “Horizontal scaling” I’m sorry I don’t speak marketing language, what is that? I’ve only been doing this for decades.
a) It has a built-in and supported horizontal scalability / HA solution.
b) For some use cases e.g. star schemas it has significantly better performance.
> Big data solutions aren't nosql
Almost all big data storage solutions are NoSQL.