.. with a caveat of his efforts to remove NULL

j88439h84 · on April 24, 2019

That's definitely part of the elegance from my perspective.

geertj · on April 24, 2019

I used to think this also. However after diving really deep into OLAP for business reporting over the last 6 months, I find NULL to be the single most powerful idea in SQL. Here’s some use cases for NULL that I find super useful:

COUNT(case when <complex condition> then 1 end) as foo

Count occurrences of complex condition over a row set. Works because COUNT ignores null.

NVL(1/NULLIF(foo, 0), 0)

Protect against division by zero. The expression 1/X is a simple example but this generalizes well. In fact, I’ve come to think that NULL turns any SQL value into a Monadic option type like those used functional languages and Rust.

Note that I’m not saying the above cannot be done without NULL, just that NULL is a really useful tool in my experience.

lostmyoldone · on April 24, 2019

Done a fair bit of BI and OLAP this last decade, and I still feel NULL cause more issues than it solves.

For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.

Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls

Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.

Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.

This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.

Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.

RmDen · on April 24, 2019

Just be careful not to use * , count does not always ignore NULLS, if you specify a column, it ignores NULLS, if you use * then it counts it (this is on SQL Server btw... might be different on other platforms)

  create table foo(bar int)
  insert foo values(1),(null),(2)

  select count(*), count(bar) from foo

  3 2

tonyg · on April 24, 2019

It's exactly unlike a monadic option type. It's much more like null is in Java, C# etc. Every type gets a NULL value here, where in ML/Haskell/etc use of Option/Maybe/etc is explicit.

quelltext · on April 24, 2019

I believe they were well aware.of that but specifically pointing out similarity with Maybe in how NULL values just bubble up just like Nothing.

This is very different from Java, where a chain of operation on null values typically very quickly leads to runtime exceptions.

nkozyra · on April 24, 2019

Also that one purpose for NULL is "unknowable" or "undefined," which means you're jamming some other value there in its place.

I know NULL causes a lot of heartache but it is a useful concept in DB design.