If your database doesn't enforce the schema you still have a schema, it's just ad-hoc and spread across all your different processes, and no one quite agrees what it is. In the real world as requirements change and your app/service increases in complexity this becomes a constant source of real bugs while simultaneously leading to garbage data. This is not theoretical, we have a lot of direct painful experience with this. Best case scenario your tests and tooling basically replicate a SQL database trying to enforce the schema you used NoSQL to avoid in the first place.
Indexes are fast but they aren't magic. A lot of what a traditional SQL database does is providing a query optimizer and indexes so you can find the data you need really fast. Cramming everything into a few tables means everything has to live in the same index namespace. Yes you can use views and sometimes even indexed views, but then you have a schema so why jump through hoops to use non-optimized storage when the database has actual optimized storage?
Separate database tables can be put on separate storage stacks. A single table can even be partitioned onto separate storage stacks by certain column values. Cramming everything into four tables makes that a lot more complicated. It can also introduce contention (depending on locking strategies) where there wouldn't normally be any.
IMHO most systems would be better served by sharding databases than by using NoSQL and pretending they don't have a schema. If application design prevents sharding then scaling single-master, multiple-read covers a huge number of cases as well. The multiple-master scenario NoSQL systems are supposed to enable is a rare situation and by the time you need that level of scale you'll have thrown out your entire codebase and rewritten it twice anyway.
The key to schema migrations is just to add columns and tables if needed, don't bother actually migrating. Almost all database engines can add columns for "free" because they don't go mutate existing rows. Some can drop columns for "free" too by marking the field as obsolete and only bothering to remove it if the rows are touched.
Postgres (and at least one other RDBMS) has partial indexes, which pretty much solves the index namespace problem you mention: http://www.postgresql.org/docs/8.0/static/indexes-partial.ht... Partial indexes are integrated into the proof-of-concept repo I linked.
Storing a data type field in the generic storage table enables the same partitioning ability as a standard schema.
99% of NoSQL database users just don't want to deal with migrations, even if they're "free" (another big issue is synchronizing application code state and DB migration state of production, testing, and developer machines), so what they really need is NoDDL, YesSQL.
> Almost all database engines can add columns for "free" because they don't go mutate existing rows. Some can drop columns for "free" too by marking the field as obsolete and only bothering to remove it if the rows are touched.
Didn't know that, thanks.
> It can also introduce contention (depending on locking strategies) where there wouldn't normally be any.
Didn't think of that. I'm aiming this at 99% of NoSQL users in which doing things you could do with SQL requires much more effort, so allowing them to do it with SQL can accept a modest performance degradation, but if you have any good links relevant to how this storage design would affect lock contention, please share.
If your database doesn't enforce the schema you still have a schema, it's just ad-hoc and spread across all your different processes, and no one quite agrees what it is. In the real world as requirements change and your app/service increases in complexity this becomes a constant source of real bugs while simultaneously leading to garbage data. This is not theoretical, we have a lot of direct painful experience with this. Best case scenario your tests and tooling basically replicate a SQL database trying to enforce the schema you used NoSQL to avoid in the first place.
Indexes are fast but they aren't magic. A lot of what a traditional SQL database does is providing a query optimizer and indexes so you can find the data you need really fast. Cramming everything into a few tables means everything has to live in the same index namespace. Yes you can use views and sometimes even indexed views, but then you have a schema so why jump through hoops to use non-optimized storage when the database has actual optimized storage?
Separate database tables can be put on separate storage stacks. A single table can even be partitioned onto separate storage stacks by certain column values. Cramming everything into four tables makes that a lot more complicated. It can also introduce contention (depending on locking strategies) where there wouldn't normally be any.
IMHO most systems would be better served by sharding databases than by using NoSQL and pretending they don't have a schema. If application design prevents sharding then scaling single-master, multiple-read covers a huge number of cases as well. The multiple-master scenario NoSQL systems are supposed to enable is a rare situation and by the time you need that level of scale you'll have thrown out your entire codebase and rewritten it twice anyway.
The key to schema migrations is just to add columns and tables if needed, don't bother actually migrating. Almost all database engines can add columns for "free" because they don't go mutate existing rows. Some can drop columns for "free" too by marking the field as obsolete and only bothering to remove it if the rows are touched.