Hacker News new | past | comments | ask | show | jobs | submit login

.. with a caveat of his efforts to remove NULL



That's definitely part of the elegance from my perspective.


I used to think this also. However after diving really deep into OLAP for business reporting over the last 6 months, I find NULL to be the single most powerful idea in SQL. Here’s some use cases for NULL that I find super useful:

COUNT(case when <complex condition> then 1 end) as foo

Count occurrences of complex condition over a row set. Works because COUNT ignores null.

NVL(1/NULLIF(foo, 0), 0)

Protect against division by zero. The expression 1/X is a simple example but this generalizes well. In fact, I’ve come to think that NULL turns any SQL value into a Monadic option type like those used functional languages and Rust.

Note that I’m not saying the above cannot be done without NULL, just that NULL is a really useful tool in my experience.


Done a fair bit of BI and OLAP this last decade, and I still feel NULL cause more issues than it solves.

For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.

Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls

Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.

Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.

This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.

Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.


Just be careful not to use * , count does not always ignore NULLS, if you specify a column, it ignores NULLS, if you use * then it counts it (this is on SQL Server btw... might be different on other platforms)

  create table foo(bar int)
  insert foo values(1),(null),(2)

  select count(*), count(bar) from foo

  3 2


It's exactly unlike a monadic option type. It's much more like null is in Java, C# etc. Every type gets a NULL value here, where in ML/Haskell/etc use of Option/Maybe/etc is explicit.


I believe they were well aware.of that but specifically pointing out similarity with Maybe in how NULL values just bubble up just like Nothing.

This is very different from Java, where a chain of operation on null values typically very quickly leads to runtime exceptions.


Also that one purpose for NULL is "unknowable" or "undefined," which means you're jamming some other value there in its place.

I know NULL causes a lot of heartache but it is a useful concept in DB design.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: