I don't think Haskell aims for this kind of success.
Abstractions can be useful and they can be beautiful. If your goal is to "easily build functional CRUD apps" you need the former. If you do research in computing science or would like to further your understanding or simply seek to appreciate the elegance and beauty in computation, you need the latter.
Admittedly, the latter goals (except research) are self-directed (they may make you a better or at least a happier programmer) and they don't directly contribute value to your customers, so I understand why you just called them "mental masturbation". However, I strongly disagree with that label. There is a long-term value in learning and general self-improvement.
Unfortunately I don't know of much research in formalizing CRUD apps, but maybe it's out there.
The long-term wish as I see it would be for us as an engineering discipline to develop a really clear understanding of the ___domain, enough to be able to build these CRUD apps in composable ways that can be reasoned about with clear language.
Guiding vague questions might include: What is the essence of a CRUD app? How could you define a CRUD app in terms of sets, functions, and categories? What kinds of laws apply in the general CRUD ___domain? How can we statically verify correctness properties for real-world CRUD systems?
That hope has nothing to do with mental masturbation and everything to do with wanting to concretely improve the state of civilization. And of course to reduce the amount of time spent making and debugging the same kind of thing over and over again.
I am by no means a researcher, but, as a programmer who writes said CRUD apps for a living, I can guarantee that I derive most of my understanding of a complex business ___domain from the structure of the relational database backing it. And what is a relational database? A (time-evolving) subcategory of the category of finite sets! Primitive types (as sets of values) and tables (as sets of primary key values) are objects. Fields (as functions of primary keys) are morphisms. Chained foreign key traversal is morphism composition.
I find this way of reasoning incredibly powerful, because it tells me quickly what kind of transactions make sense in a business ___domain. Of course, this is contingent on the database being well designed. It is exactly the same as with Haskell types - if you can leverage the (type system) schema to encode interesting logical properties of your (problem) business ___domain, then (type checking) checking a query against the schema in itself becomes a lightweight form of formal verification.
One thing that must be remarked, however, is that, sadly, SQL lacks analogues of sum and product types, that is, tables whose cardinality (number of rows) is the sum or product of the cardinalities of other tables. This addition would make SQL even more powerful, but it would require a move to a more category-theory-based foundation for databases. (The relational model is based on first-order logic.)
Isn't a join of two tables essentially the product of the two tables?
That said it does appear that sums are missing. Infact it seems like every mainstream language manages to fail to add sum types for some strange reason and you have to recreate them from scratch.
An unconstrained `join` is indeed the product of two queries, and a `union all` can be used to get their sum as well. This is in the query language. But what I need is sums and products in the table language.
There is a functor from the category of tables to the category of queries, which maps every table `foo` to the query `select * from foo`, however, a table isn't the same thing as a query: table fields may contain user-entered data, whereas query fields are always computed from table fields.
To give a concrete example, consider a database with two primitive tables, `male` and `female`. A primitive table is “independent” in the sense that you can add or remove rows from it at will. In other words, a primitive table behaves just like a normal SQL table.
Now define the derived table `person` as the sum of `male` and `female`. Because `person` is derived, you don't explicitly add or remove rows from it. Instead, every time you add or remove a `male` or `female`, a corresponding `person` also gets added or removed.
What I want is the ability to add the field `name` to the `person` table directly, without it existing in either `male` or `female`. You can't do this in SQL. The situation is similar for products.
> An unconstrained `join` is indeed the product of two queries, and a `union all` can be used to get their sum as well. This is in the query language. But what I need is sums and products in the table language.
The "table" language of SQL (DDL) includes pretty much the the entirety of the "query" language (DQL) through view definitions.
Of course, to do what you would really want to do to use relations to implement product and sum types, you needed materialized views with appropriate unique indexes for the candidate keys (and, ideally, auto-deriving the keys for product/sum tables from those of the base tables -- for product tables you can just concatenate the keys of the base tables, for sum tables you need the keys from the base tables to be equivalent and then to have an additional column that uniquely maps to the source table.)
A materialized view cannot contain user-editable fields unless these fields actually come from some actual table. Hence, a materialized view isn't a table in its own right, let alone a sum or product table.
> A materialized view cannot contain user-editable fields unless these fields actually come from some actual table.
"User-editable" is actually superfluous; this is true (or false) of read-write attributes in exactly the same way as it is of read-only attributes.
> Hence, a materialized view isn't a table in its own right, let alone a sum or product table.
You seem to be use "sum or product type" and "sum or product table" in somewhat unusual ways. Upthread, you suggested that sum and product tables were simply realizations of sum and product types by way of tables; but sum and product types have domains that are, respectively, the sum or product of the domains of the set of types each is based on, they don't include additional data.
The kind if augment sum or product relation you seem to be referring to can be achieved in a relational database through views (including, to the extent useful to the application at hand, materialized/indexed views), where the "base" sum or product type is a materialized view as described in the grandparent comment (including the described indexes), and the additional fields are supplied through a related table with a foreign key constraint (the augmented sum/product being represented with a view that joins that table to the base sum/product view.)
In general, to say that a category has sums and products, these sums and products must be objects of the same category. The axioms for a category are totally agnostic to the concrete nature of its objects. Just because Hask objects are types, it doesn't mean objects in other categories are Haskell types of behave like Haskell types. In the context of databases, it makes sense to treat a schema as a category whose objects are its tables, and whose morphisms are chained foreign key traversals. And tables contain fields that carry data of their own.
I still don't understand. What you call derived tables seem like views to me. They always need to be composed of primitive tables. So there will be a primitive `name` table and the person view will be `name * (male + female)`... Admittedly everything will be in 6NF but it still seems doable. Is there a limitation that I'm missing?
Sorry for the delay. I wasn't really thinking of making a primitive table just for names - what would its primary key be anyway?
The limitation you're missing is that SQL doesn't let you readily associate a user-entered `name` with each `person`. The best you can do is put a `name` field in the `male` table, then another `name` field in the `female` table, and use both `name`s when defining a `person` view. In my opinion, this is inelegant.
It isn't apparent from my identifiers, but, in my `existing.sql`, `person`'s real primary key isn't just `person_id`, but rather `(gender, person_id)`.
The way you've handled it, now you have an invariant to maintain that `male_id`s and `female_id`s don't collide. If you want to define arbitrarily many sum tables in your database, this can be really hard to enforce. My `proposed.sql` doesn't have such a problem.
Product tables are missing too. Let's say you have tables `foo` and `bar`. For every `foo` and every `bar`, you want the user to specify a `qux` value. Presently, what you need to do is:
(0) Create a table `foo_bar`, with fields `foo_id`, `bar_id` and `qux`. In particular, `qux` must be nullable. [Yuck!]
(1) Add triggers to `foo` and `bar` that automatically insert or delete rows from `foo_bar`.
(2) Hope [I'm not joking] the user remembers to set all the `qux` values in `foo_bar` whenever he inserts a row into either `foo` or `bar`.
How does SQL lack product tables? The product of the column spec (foo int, bar bool) with the column spec (baz text, quux date) is the column spec (foo int, bar bool, baz text, quux date).
A column specification isn't a table. A table contains user-editable fields. If you have two SQL tables and compute their Cartesian product, the result is a query, not a table.
In my proposal ( https://gist.github.com/eduardoleon/1e8ad9174ec5ae0386dd ), the category is still freely generated, but from something that has more structure than a quiver - some nodes may be designated as coproducts or products of other nodes.
In either case, the category can't possibly have all coproducts and products, and that's okay with me - a database can only have finitely many tables, after all. My beef with SQL is that it doesn't allow this category to have any coproducts and products at all.
What do you mean by "SQL tables"? Presumably not the type of a table (you rejected that notion when I proposed it under the name "column specification") but rather a table with all its data?
Yes, by “SQL table”, I mean the table itself, with all its data. I don't think it's useful to think about “the type of a table”. Rather, I view tables as types in their own right - a table is the type of its rows.
I can see why this would seem weird from a Haskell perspective. Haskell encourages the programmer to view types as static collections of values. That is, normally, Haskell types don't get new inhabitants as your program runs. There are exceptions to this rule, like `IORef`s and `STRef`s, but idiomatic Haskell doesn't use these much AFAICT.
However, in Standard ML, some (static!) types are dynamically evolving collections of values. For example, if the control of flow reaches the line `exception Foo of string`, a new constructor `Foo` is added to the existing `exn` type. (As for why this is useful, see: https://existentialtype.wordpress.com/2012/12/03/exceptions-... ) Or, if the expression `ref x` is evaluated, where `x` has type `foo`, then the type `foo ref` gets a new inhabitant - a freshly created mutable cell initially containing `x`.
“Tables as types” is just reusing the idea of dynamically evolving (static!) types in a database context. Inserting or deleting rows from your `customer` table changes its collection of inhabitants, but it doesn't change the fact that there is a single type of customers.
As I mentioned in my comment below: https://news.ycombinator.com/item?id=10529838 , the category of SQL queries has coproducts and products, but the category of SQL tables does not. Coproduct and product tables are very useful in their own right.
Not sure what definition you're using for the category of tables, but I don't think the distinction between table and query is really that significant, at least from a theory point of view. You can declare views and materialized views. In some SQL dialects you can even define triggers which allow you to 'update' them, not that I would have thought mutability would be particularly nice to reason about in a category-theoretic framework.
If you really want to do theory on this stuff, just use the relational algebra, or better yet just plain first-order logic. Much nicer, you have all the products and coproducts you want, and the results can probably be re-applied to SQL with a bit of cludge-work :)
Agreed that SQL is ugly as hell, but if you want to talk about its theoretical properties that's a separate debate. Theory doesn't care whether something's aesthetically pleasing, just whether it's possible.
I don't think Haskell aims for this kind of success.
Abstractions can be useful and they can be beautiful. If your goal is to "easily build functional CRUD apps" you need the former. If you do research in computing science or would like to further your understanding or simply seek to appreciate the elegance and beauty in computation, you need the latter.
Admittedly, the latter goals (except research) are self-directed (they may make you a better or at least a happier programmer) and they don't directly contribute value to your customers, so I understand why you just called them "mental masturbation". However, I strongly disagree with that label. There is a long-term value in learning and general self-improvement.