Do you really need foreign keys?

goostavos · on Dec 21, 2023

Follow this advice with caution. Dropping foreign keys is effectively giving up part of the C in ACID. It should be done with very, very open eyes to the downsides. I'm not sure the author is selling the "when" side of this very well. Migrations are "hard" because the database is forcing you to handle correctness criteria that are easy to ignore. "Lock contention" is the database covering your sloppy ill-thought out application code from doing the Wrong Thing. Yes, of course, you take a minor performance hit in exchange for consistency, but how many of us actually work on applications where that trade off is the wrong one? Developers are frustratingly superstitious about scale and their need to do it.

The data will very likely outlive the application that created it. It's almost guaranteed to outlive your tenure on the team. Viciously guarding its correctness solves all the problems that not guarding it causes.

erik_seaberg · on Dec 21, 2023

Whenever anyone talks about “the application,” I immediately ask “what about all the other applications?” I promise you that you will find customer service and accounting and biz dev have also built stuff that uses the database to get their jobs done (probably not with the same ORM or even the same language) unless you have taken draconian measures to prevent them.

dasil003 · on Dec 21, 2023

Interesting that this whole thread makes no distinction between read and write access, as those are dramatically different use cases. Read access is by far the more necessary, and is usually solved relative easily by saving snapshots to a data warehouse. This is no panacea as data can still easily be misinterpreted or replicated and used out of context in violation of expected production data lifecycle by the team that owns the source-of-truth, but in many cases it's trivial and minimizes coordination overhead.

Write access on the other hand is a different story. IMO restricting write access of a database to a single application is table stakes for scaling complex applications. Sure there are other approaches such as writing all your logic in the database via constraints and stored procedures and making the DBA a god-like figure, but these approaches have fallen out of favor as they've proven less scalable compared to wrapping a DB with a service that has exclusive write access. The latter arrangement allows many constraints to be enforced in a horizontally scalable and more legible layer, while still leveraging the DB to prevent races with a better menu of tradeoffs.

Of course this requires thoughtful service and interface design by competent technical ___domain experts, which is easier said than done, but the alternative allows the overall system cohesion to degrade to where no one understands the system well enough to make any changes without risking major incidents. At that point, the agency of system builders and maintainers is replaced by care and feeding of the unknowable system to not upset the status quo, accompanied with increasingly byzantine hacks and workarounds to enable any business changes.

pmontra · on Dec 21, 2023

The no foreign keys advice tastes of 2005/2010 when the lack of support in some frameworks (Rails included) prompted developers to dismiss them as unnecessary. This piece of advice at least quantifies the terms of the tradeoff.

Regarding the correct "saving snapshots to a data warehouse", if there are one million web apps in the world, how many of them have the scale to noticeably benefit from either doing without foreign keys or from a data warehouse? I've been using many of them like everybody else but they are totally irrelevant to the long tail of apps and developers.

By the way, a good DBA can make miracles for the performances of most databases in that long tail. A few days of work are worth the cost especially if the team pays attention and learn the lesson. The best remark I got about a DB of mine was a "not bad for being only a developer". The DBA version was better.

ttfkam · on Dec 21, 2023

If I recall correctly, "no foreign keys" hit its stride back in early PHP days when MySQL didn't support them properly. Rather than cop to the fact that MySQL just implemented them badly, MySQL AB went on a dev PR run telling folks that foreign keys weren't actually useful and just slowed a system down.

Once MySQL implemented them less horribly, the PR push finally started to die down. I will never forgive them for that, and decades later where Oracle controls MySQL (and arguably is doing a better job), I still hold a grudge against MySQL that I have to actively suppress when the contract demands it.

pmontra · on Dec 22, 2023

Nobody with a Computer Science or especially a Software Engineering degree should have felt for it but I know a number of good developers with a more varied background. Some of them inevitably became team leaders etc. By varied backgrounds I mean Philosophy, Agricultural Sciences, Graphic Design. Some of them know very well how a database work, some admit to never have learned SQL, go figure all the theory and the rationale behind some technologies. Give them some blog posts that make the job done without foreign keys and they won't even know what a foreign key is until it's 2015 and people started to move to PostgreSQL.

ttfkam · on Dec 22, 2023

My degree is in literature. (I'm that dev who also writes docs, makes presentations, and serves as tech lead.) I still know that foreign key constraints should be the default. I even often have to remind those with CS and CE degrees that DBs aren't just dumb bit buckets and that database schemas are critical business logic.

    Bad programmers worry about the code. Good programmers worry about data structures and their relationships. – Linus Torvalds

dasil003 · on Dec 21, 2023

> Regarding the correct "saving snapshots to a data warehouse", if there are one million web apps in the world, how many of them have the scale to noticeably benefit from either doing without foreign keys or from a data warehouse?

Yes I agree. This advice was assuming you have scale that necessitates multiple services, at which point you'll want to be able to query against data from multiple sources. I'm with you that the vast majority of teams (finger in the air: less than 20 full-time engineers working on a typical web app) are probably better off with a monolith and single database. At this point a read replica is a low effort way to safely provide access to a wide range of stakeholders.

JackFr · on Dec 21, 2023

Indeed - I immediately thought of hypothetical system (similar to many systems I’ve worked on) where SELECTS are 100x more common than INSERTS which are 100x more common than DELETES.

The 5x speed up on delete performance is a useless optimization.

mavelikara · on Dec 21, 2023

+1. When I started in the industry, it was common for more experienced developers to drill the “data outlives the application that generated it” principle into you. Somewhere in the transition to NoSQL and back we lost this.

Pet_Ant · on Dec 21, 2023

The _data_ yes, the _database_ no. The data will get migrated to different solutions at different times. I've lost track of how often I've been porting MySQL to SQL Server, SQL Server to Postgres, Postgres to Mongo, Mongo to cloud etc.

EDIT: fixed capitalization of "PostGres"

winnebago · on Dec 21, 2023

This really depends on the size of your data set. If you have a large database (TBs+) it's likely to be very long-lived due the effort/hardware resources a migration requires - especially if you want to improve the schema when migrating.

blowski · on Dec 21, 2023

I’ve seen a database migration project outlive application rewrites, multiple changes in CTO, and the company being acquired.

jhardy54 · on Dec 21, 2023

Friendly suggestion: the conventional capitalization is either Postgres or PostgreSQL.

SkyPuncher · on Dec 21, 2023

Context is key.

If you have an established business or startup, data will outlive the application. However, you need a product that lives long enough for either data or application to matter.

In the startup world, that means making decisions that help you ship now, at the expense of debt/costs down the road.

mavelikara · on Dec 21, 2023

IME, Foreign Keys do not slow down development velocity, after you factor out the upfront time investment it takes to understand them, normal forms, and other RDBMS concepts. Other posters in this thread have argued that they take CPU cycles during program execution - which is factual correct.

da_chicken · on Dec 21, 2023

It wasn't the transition to NoSQL. It was the transition to web APIs. Now the application that can access the database has a universal shim in place that deals with the messy business of taking requests and applying them to the data store.

Now there's typically three applications that need DB access:

1. The application it was built for.

2. The read-only reporting and visualization tools.

3. The web API.

arrowleaf · on Dec 21, 2023

My assumption (with modern applications!) is that nothing but the role directly owning the data will access the data. The development and DBA teams will likely have a role they can assume after performing a carefully-audited breakglass procedure to use in an emergency (rare) or to fulfill audit tasks. At least in my org this is a well-known problem with legacy applications sharing databases. Limit access to the database to a single role, used by a single application, and you absolve so many issues.

chasil · on Dec 21, 2023

When your data has outlived several generations of front-end technologies (mine started with Powerbuilder), you will find that ownership of the data does not control the evolution of how it will be used with what replaces past clients.

One particularly interesting Oracle problem is:

  ORA-00060: deadlock detected while waiting for resource

Tom Kyte's book, Expert One-on-One Oracle, describes the primary culprit:

"Oracle considers deadlocks to be so rare, so unusual, that it creates a trace file on the server each and every time one does occur... The number one cause of deadlocks in the Oracle database, in my experience, is un-indexed foreign keys."

For another perspective, add to this a default setting in every SQLite database:

  $ sqlite3 verynew.db
  SQLite version 3.34.1 2021-01-20 14:10:07
  Enter ".help" for usage hints.
  sqlite> .dump
  PRAGMA foreign_keys=OFF;
  BEGIN TRANSACTION;
  COMMIT;

Foreign keys can cause interesting problems, and SQLite specifically prefers to avoid them.

pixl97 · on Dec 21, 2023

>Limit access to the database to a single role, used by a single application, and you absolve so many issues.

This kinda sounds like "Get rid of 90% of the usefulness of having a database"

Of course, maybe you mean make the same data the DB has available via API, or make other users of the data read only.

arrowleaf · on Dec 21, 2023

Yes, if you have downstream services that need to make use of your data, you would provide an API. That way you can manage the number of connections, how the queries are formed, any sort of encoding/decoding, authorization, caching, etc. Definitely not limited to read-only.

jbmsf · on Dec 21, 2023

100%.

goostavos · on Dec 21, 2023

I would argue that if you have completely different contexts / business lines / concerns (customer service, accounting, biz dev) all directly accessing the same database you have far, far larger architectural concerns that FK will ever hope to address.

eddd-ddde · on Dec 21, 2023

I strongly believe the opposite. Your business IS your database. How do you handle access to the data without access to the data?

It is inevitable that 2 "organizations" will at some point need access to the same data. You could just have each of them decide how they interact with it. It doesn't matter how they do it because the database itself makes sure invariants are kept true (such as FKs).

What's the alternative?

arrowleaf · on Dec 21, 2023

Have a single application clearly owning the data. All access to the data is done through that application. Why would you _want_ multiple applications deciding how to handle data in a common store? What about access authorization to sensitive data?

eddd-ddde · on Dec 21, 2023

Because that single application already exists, in my case its postgres.

Pgweb is one of my interfaces, admin dashboard for free, and I am sure the changes I make there are as valid as changes through any other interface.

I can implement a website as a SSR app that talks directly to the db. Maybe tomorrow I decide I need to work on a web scraper that will use python, instead of adding more API endpoints to allow the scraper to talk to the database, I just talk to the database...

The database is that "single appplication", no need to write custom endpoints for every operation when SQL is good enough.

It's all a spectrum of course, today you can even go as far as to use something like pg_graphql and you don't even need to write a REST api yourself.

Edit: I forgot to answer the last section, but postgres can totally handle autorization with RLS for instance, allowing users to only see their own data, or maybe data marked as public, etc.

jrumbut · on Dec 21, 2023

> Because that single application already exists, in my case its postgres.

This is exactly what postgres was designed for! Tens of thousands of hours of work over decades to solve the problem of relational database management. That's why we call it an RDBMS!

Which isn't to say you shouldn't make your own API ever. There are a lot of situations where you don't want things to connect directly to postgres.

But you shouldn't be afraid of having multiple systems connect to postgres. It has incredibly mature and robust features to accommodate that use case. It's the expected use case.

DasIch · on Dec 21, 2023

If you operate at the scale where you consider the performance implications of foreign keys, you probably are not ok with anyone accessing the database or running just any query against it.

It is not realistic that you can trust everyone who needs to access the data with access to the database as they might easily cause problems with poorly written queries.

Additionally you may want invariants maintained, that the database cannot maintain but an application in front of it can.

Also for historical data or analytical queries postgres is not ideal either, so you probably want to move the data into some OLAP database or datalake.

> I can implement a website as a SSR app that talks directly to the db. Maybe tomorrow I decide I need to work on a web scraper that will use python, instead of adding more API endpoints to allow the scraper to talk to the database, I just talk to the database...

If the website and the scraper are just parts of the same application, it makes sense to do this but if they are genuinely different applications. I would use different databases here.

Akronymus · on Dec 21, 2023

> If you operate at the scale where you consider the performance implications of foreign keys, you probably are not ok with anyone accessing the database or running just any query against it.

I used to work on an application where ALL database accesses were via stored procedures. Genuinly the best dev experience, to me, so far.

JackFr · on Dec 21, 2023

> I used to work on an application where ALL database accesses were via stored procedures. Genuinly the best dev experience, to me, so far.

I found such an environment to be simply terrible.

In general, I think stored procs certainly have their place, but if ALL access is through stored procs, you better have a schema that’s basically set in stone otherwise dev will turn into a nightmare.

butlerm · on Dec 21, 2023

Large ERP systems do that sort of thing as a matter of course and have for decades now. It does require careful planning and design. I mean AR / AP / scheduling / manufacturing / inventory and so on.

The main downside of splitting everything into isolated databases is that it makes it approximately impossible to generate reports that require joining across databases. Not without writing new and relatively complex application code to do what used to require a simple SQL query to accomplish anyway.

Of course if you have the sort of business with scalability problems that require abandoning or restructuring your database on a regular basis, then placing that kind of data in a shared database is probably not such a great idea.

It should also be said that common web APIs as a programming technique are much harder to use and implement reliably due to the data marshalling and extra error handling code required than just about any system of queries or stored procedures against a conventional database. The need to page is perverse, for example.

That does not mean that sort of tight coupling is appropriate in many cases, but it is (typically) much easier to implement. Web APIs could use standard support for two phase commit and internally paged queries that preserve some semblance of consistency. The problem is that stateless architecture makes that sort of thing virtually impossible. Who knows which rows will disappear when you query for page two because the positions of all of your records have just shifted? Or which parts of a distributed transaction will still be there if anything goes wrong?

winnebago · on Dec 21, 2023

Different business units often want aggregate, filter, join and transform data in different ways - sometimes in ways that are hard to anticipate.

In general, the closer to the persistence layer you can perform those transformations, the better they will scale. If you pull the transform into the app layer, you need to move and serialize more data. If you pull the transformation into a constellation of apps, you need to move and serialize a constellation of data.

(edit: formatting)

Pet_Ant · on Dec 21, 2023

> probably not with the same ORM or even the same language

I mean, they shouldn't? Like you've just identified a bug: another application can access your database. If another department needs your data, they should request an endpoint that you control. You should be using an "application database"[1] not an "integration database"[2].

[1] https://martinfowler.com/bliki/ApplicationDatabase.html [2] https://martinfowler.com/bliki/IntegrationDatabase.html

epicfeedback · on Dec 21, 2023

They are you. You is they. It’s one company with one goal — keep companying.

The idea that ever “department” should access every other department’s data through some bespoke interface that the latter department maintains might work at some corporate behemoth, but at almost all other scales is absurd.

ch4s3 · on Dec 21, 2023

This would be a total waste of effort when you need to be building a product and iterating. I hate articles like this because they do a poor job contextualizing the tradeoffs and when it might be appropriate to do the weird exceptional thing.

IMHO if you have a performance critical case when foreign keys are in the way, load THAT data into an in memory DB on a recurring basis and server time sensitive requests from there.

GoblinSlayer · on Dec 22, 2023

Foreign keys are slow on delete, not on read. If you have a popular table, say, users, and all other tables refer to it, then deleting a user locks the database for time proportional to the number of foreign keys - good old linear scaling.

bhaak · on Dec 22, 2023

It doesn't lock the database, it only locks the rows of the tables that have foreign keys to the popular tables and are referencing the user you are deleting.

Rightly so because when deleting the user the database needs to do work to keep the referential integrity. Either it nulls the user_id, delete the rows, or it throws an error.

ch4s3 · on Dec 23, 2023

Sorry I should have specified read/write and side that you can read from something in memory that is consistent after some short delay.

bborud · on Dec 22, 2023

If people use the database as an integration layer, IMNSHO, they deserve what they get. Database is implementation - not integration interface.

forinti · on Dec 21, 2023

That's the best advice: data will outlive the application.

It will also serve other applications.

So treat it as its own thing, not as an appendix of the application.

postalrat · on Dec 21, 2023

Imo better advice would be your data may outlive the schema it's stored in or your application may outlive the data.

buttercraft · on Dec 21, 2023

> The data will very likely outlive the application that created it.

Yeah, if your front end is garbage, you can fix or rewrite it. If your data is garbage, you've lost. Depending on your requirements, of course.

shayonj · on Dec 22, 2023

Hi! The author here. I appreciate your reply here and in few other threads for this post too. I think they are very well thought out. I also like how it brewed more branches of other good discussions.

I totally agree that this decision isn't to be taken lightly. That said, I figured I'd also take this opportunity to clarify a few things:

- My post aims to prompt developers to critically assess whether the constraints they employ are truly essential or just conventionally used without question. They should weigh these constraints against the considerations mentioned in the post.

- While I recognize that lock contention is the database's way of managing application code, I propose that if foreign keys aren't necessary from the outset, then lock contention may only serve to impede performance. If you can rewrite your application code, say with `FOR UPDATE SKIP LOCKED`, great!

- I am also not recommending to drop all FK constraints, but to challenge and see if each one of them is necessary, and if it is, also good.

- As always, there is no solution fits all!

Fire-Dragon-DoL · on Dec 23, 2023

I am fine with the concept of foreign keys, but if you have a perfectly designed database, it should be really hard to insert a row in any table by definition.

What would solve this is defaulting to raise foreign key constraints errors at the end of transaction rather than when rows are inserted. I know postgres has an option for it, it should be the default

butlerm · on Dec 21, 2023

If you do not particularly care about performance or have a great deal of headroom then database enforcement of referential integrity is great. Alternatively you could just write test cases to check for it and not pay the severe performance penalty.

The other major downside of database enforcement of referential integrity is the common need to drop and re-create foreign keys during database schema upgrades and data conversions.

setr · on Dec 21, 2023

You’re still going to pay the cost of maintaining referential integrity — you’re just doing it on the app side. You can do it faster by being not-correct — eg you don’t need a lock if you ignore race conditions — but it’s not like the database is arbitrarily slow at doing one of its basic fundamental jobs.

Of course, you can just skip the validation altogether and cross your fingers and hope you’re correct, but it’s the same reasoning as removing array bounds checking from your app code; you’ve eked out some more performance and it’s great until it’s catastrophically not so great.

Your reasoning should really be inverted. Be correct first, and maintain excessive validation as you can, and rip it out where performance matters. With OLTP workloads, your data’s correctness is generally much more valuable than the additional hardware you might have to throw at it.

I’m also not sure why dropping/creating foreign keys is a big deal for migrations, other than time spent

butlerm · on Dec 21, 2023

It is quite common for modern databases to have multiversion concurrency so that writers do not block readers. If you do not your transactions should either be awfully short, you should be prepared to wait, or you should implement dirty reads (which are quite common in any case).

dasil003 · on Dec 21, 2023

"just write test cases to check for [referential integrity]" is doing some awful heavy lifting in this comment.

Assuming a standard n-tier application architecture, how do you guarantee the test prevents race conditions?

jrumbut · on Dec 21, 2023

You either end up reinventing foreign keys, your support volume will scale faster than your data, or user experience will suffer.

There may be situations where foreign keys become too much overhead, but it's worth fighting to keep them as long as possible. Data integrity only becomes more important at scale. Every orphaned record is a support ticket, lost sale, etc.

butlerm · on Dec 21, 2023

Orphaned detail records are usually inconsequential, like uncollected garbage. References to anything with an optional relationship should use outer joins as a matter of course. If you delete something that really needs to be there you have a problem, which is one of the reasons not to delete rows like that, ever, but rather to mark them as inactive or deleted instead.

butlerm · on Dec 21, 2023

Typically you look for orphan rows - the sort of thing ON DELETE CASCADE was invented to prevent. Another thing to check for are records that need to exist but should have references cleared when something else is deleted, e.g. ON DELETE SET NULL. And the third thing is ON DELETE RESTRICT.

You can check for the first two of those things after the fact, and they are relatively benign. In many cases it will make no difference to application queries, especially with the judicious use of outer joins, which should be used for all optional relationships anyway.

If you need ON DELETE RESTRICT application code should probably check anyway, because otherwise you have unexpected delete failures with no application level visibility as to what went wrong. That can be tested for, and pretty much has to be before code that deletes rows subject to delete restrictions is released into production.

As far as race conditions go, they should be eliminated through the use of database transactions. Another alternative is never to delete rows that are referred to elsewhere and just set a deleted flag or something. That is mildly annoying to check for however. Clearing an active flag is simpler because you usually want rows like that to stay around anyway, just not be used in new transactions.

hermanradtke · on Dec 21, 2023

> you could just write test cases to check for it

This effectively means you are building an embedded database in your application and using the networked database for storage. There are a few reasons to do this and a million reasons not to.

norir · on Dec 21, 2023

I once worked somewhere that used rails in lieu of foreign keys. The result was a brittle nightly delete_orphaned records script as well as obscure user visible bugs. My team started adding foreign keys to our records and unsurprisingly caught bugs in our application code that otherwise would have been missed. Personally, I think the default should always be to usr foreign keys and only if you have a genuine scalability problem might you consider dropping them or put then in a different database with lighter data integrity guarantees.

seadan83 · on Dec 21, 2023

This anecdote is not very surprising to me. Foreign keys are on a scale of DB constraints. I opt to add a maximum of constraints where I can. Adding a date that should be in the future - I'm adding a constraint! Did you mess up your time zone conversions & are trying to add a date in the past - the constraint catches it. At some point the constraints are strong enough where with high confidence the following statement can be said: "if the data exists in the database at all, it is correct & complete." This is the "database as fortress" principle, that only correct data should ever be allowed into the database to begin with.

shayonj · on Dec 21, 2023

I agree and makes sense. Starting out with them and then challenging the setup is a good practice. I like the idea of moving to a lighter data integrity setup too (ofc when possible)

enobrev · on Dec 21, 2023

I worked on a couple projects in the early aughts like this (php / mysql). Even did it myself for a project and fortunately learned my lesson early enough to fix that very project and rely in the database to handle data integrity.

Always important to remember that while learning from others is important; They're just as human as you are.

jamestimmins · on Dec 21, 2023

Foreign keys are often the best (and only) documentation of the data model, and the only one guaranteed not to decay, so this is an important consideration when rethinking them.

That said, I quite like this post and the way the author is thinking critically about software decisions most of us take for granted. Even if you decide that foreign keys are always best, this type of thinking is a great way to improve your understanding.

claytongulick · on Dec 21, 2023

I agree - and I also appreciate the author's objectivity and focus on making the correct technical decision for the problem. Dogma is the bane of good architecture.

I completely agree with the idea that foreign keys aren't always recommended. For example, one of the reasons that MongoDB was so great at edge writes was because of the lack of referential integrity and the "document" as a row concept - denormalize the data into a document and store everything you need there.

That works great for high volume edge writes that you want to operationalize later, asynchronously. Sure, the data can be messy and may have errors - but many use cases tolerate this well.

The nice thing about modern postgres is that you get the best of both worlds, RDBMS stuff when you need it, and NoSQL stuff if you don't.

The author's point, which I think is an excellent one, is that just because it's in a database, it doesn't necessarily make sense to go full 4th normal form on everything.

seadan83 · on Dec 21, 2023

+1 for that documentation.

I was recently working on a system that split data across multiple database instances (unnecessarily) and that means referential integrity is lacking and sometimes a huge problem.

To expound, I was tracking down something call a "slotid" and the absence of foreign keys does not mean anything. That data could very well live in some other table on the other DB instance. It turned out though that "slot-id" instead referred to a start time of day as counted by 'minutes-from-midnight'. Thus "slot-id=375" was just "6:15am"... Yup.. 3 hours of my life I will not get back to realize that data did not refer to anything at all. If everything that could have had foreign keys did, then it would have been a super quick investigation to realize the data was not a reference to anything at all.

shayonj · on Dec 21, 2023

Thank you :)

KeyBoardG · on Dec 21, 2023

Unless your project is small and for learning, or won't be around for very long, Yes. You absolutely do or will end up in a technical debt world of hurt. I have worked on systems without them that are 15+ years old and the vast, vast majority of fixes and refactoring were self inflicted wounds like this. We added FK's and indexes where necessary and wound up faster and safer.

x86_64Ubuntu · on Dec 21, 2023

That sounds awful. My first instinct would be to see where an FK constraint would fail, and then ask the business what it should be. There is no guarantee that every business case from way back needed an FK, or that anyone in the business knows which FK value should be there. It sounds absolutely miserable.

eddd-ddde · on Dec 21, 2023

In my experince, adding proper constraints will not only ensure code can never write invalid data to disk, it will also help you realize when business rules are lacking or failed to consider edge cases.

cdcarter · on Dec 21, 2023

The Salesforce CRM application somewhat "famously" does not use native DB foreign keys to model most relationships, and has a custom relational integrity and indexing layer.

Mostly, this is in order to support our complex custom schema functionality, described in the Multitenant Whitepaper here (https://www.developerforce.com/media/ForcedotcomBookLibrary/...)

But interestingly, this also lets our internal data modelers choose to make a relationship e.g. leave dangling records or have other update/delete patterns that are appropriate for the data and data volume. Obviously there are cleanups and tradeoffs required here. It's certainly not a model I'd suggest people start with, but there really are times where database native FKs aren't the answer.

jamestimmins · on Dec 21, 2023

Do you think this has served y'all well in the long term?

I assume "internal data modelers" is an actual role of person whose job is to ensure long run data integrity?

oxfordmale · on Dec 21, 2023

Salesforce storage is about 1000 times more expensive than regular storage, so might not be the best example.

Data volume and Salesforce also don't belong in the same sentence, as it is not comparable to the data volume a basic Postgres database can handle.

Until recently Salesforce didn't provide a data backup facility, other than exporting CSVs, so not even sure you can call it a database.

_heimdall · on Dec 21, 2023

Foreign keys allow for pushing a really vital piece of business logic down to the database itself, referential integrity.

This can be done in application logic, but that risks bugs allowing broken references into your data. Its more foolproof when these checks are enforced directly at the db, making sure data is valid before it's stored or updated.

kevincox · on Dec 21, 2023

No to mention that most SQL databases have complex consistency rules. Just checking that the target row exists (or not) before adding (or deleting) a record may not result in a correct database state in the face of concurrent transactions.

You better very carefully read your database's concurrency guarantees and ensure that all transactions are running with the right consistency level otherwise you will have a bad day.

On top of that the JOIN method of following foreign keys often leads to missing rows if expected keys don't exist which can be very hard to debug.

PartiallyTyped · on Dec 21, 2023

There was an interesting topic on r/experienceddevs [1], where the dev team was arguing with some DBAs on adding another column. The DBA insisted on using FKs over enum / application level logic, etc. The comments there present some excellent arguments on top of everything you said.

[1] https://www.reddit.com/r/ExperiencedDevs/comments/18ldexi/wr...

gokhan · on Dec 21, 2023

Title should be "Do you really need Foreign Key constraints?". Foreign Key is the field itself, will still be there without the constraint.

LtWorf · on Dec 21, 2023

And the body should really be "Yes. You do."

_a_a_a_ · on Dec 21, 2023

True, a foreign key by itself is just a business requirement. A foreign key constraint is what the database does to enforce that. There's rarely any confusion about the two concepts, though.

shayonj · on Dec 21, 2023

Great call!

Scubabear68 · on Dec 21, 2023

I do consulting for a lot of different companies. The older ones who did not religiously use PK/FK constraints have databases that are nightmares to maintain. The data will start to rot surprisingly quickly, devs will react by writing weird code to compensate, and your life will not be fun.

boobsbr · on Dec 22, 2023

I worked in different banks for the better part of a decade, lots of internal applications, each with their own DB.

They all had PK/FK/unique/check constraints, and they were still a clusterfuck to understand and maintain. Lots of outright stupid and dangerous code to work around the issues.

Still, if there were no constraints, it would have been even worse, as some 'intelligent' people temporarily disabled constraints to update data, resulting in data integrity hell.

amtamt · on Dec 21, 2023

This post seems like long version of "Directive 595 Part 2" from https://thedailywtf.com/articles/directive-595

Anyone planning to test "YMMV", or full blown follow through, is advised to read that dailywtf.

lrobinovitch · on Dec 21, 2023

This github issue is often linked when this topic is discussed: https://github.com/github/gh-ost/issues/331

> Personally, it took me quite a few years to make up my mind about whether foreign keys are good or evil, and for the past 3 years I'm in the unchanging strong opinion that foreign keys should not be used. Main reasons are:

> * FKs are in your way to shard your database. Your app is accustomed to rely on FK to maintain integrity, instead of doing it on its own. It may even rely on FK to cascade deletes (shudder). When eventually you want to shard or extract data out, you need to change & test the app to an unknown extent.

> * FKs are a performance impact. The fact they require indexes is likely fine, since those indexes are needed anyhow. But the lookup made for each insert/delete is an overhead.

> * FKs don't work well with online schema migrations.

iLoveOncall · on Dec 21, 2023

> FKs are a performance impact. The fact they require indexes is likely fine, since those indexes are needed anyhow. But the lookup made for each insert/delete is an overhead.

This is not a valid argument at all and I'm concerned anyone would think it is.

If you have a foreign key, it means you have a dependency that needs to be updated or deleted. If that's the case, you will have an overhead anyway, the only question being whether it's at the DB level or at the application level.

I don't think there are many cases where there's any advantage to self-manage them at the application level.

> FKs don't work well with online schema migrations

This seems to be related only to the specific project that the issue is about if you read about the detailed explanation below.

butlerm · on Dec 22, 2023

> If that's the case, you will have an overhead anyway, the only question being whether it's at the DB level or at the application level.

Inserts and updates do not require referential integrity checking if you know that the reference in question is valid in advance. Common cases are references to rows you create in the same transaction or rows you know will not be deleted.

If you actually want to delete something that may be referred to elsewhere then checking is appropriate of course, and in many applications such checking is necessary in advance so you have some idea whether something can be deleted (and if not why not). That type of check may not be race free of course, hence "some idea".

randomdata · on Dec 21, 2023

> the only question being whether it's at the DB level or at the application level.

It is not a binary situation like that. With the rise of 'n-tier' systems that are ever so popular today, there are often multiple DB levels. The question is not so much if it should go into the end user application – pretty much everyone will say definitely not there – but at which DB level it should it go in. That is less clear, and where you will get mixed responses.

brlewis · on Dec 21, 2023

Note that this was written in 2016 in the context of a mysql-centric project. You will not find an "unchanging strong opinion that foreign keys should not be used" outside that context.

I haven't kept up with mysql enough to know if there are still good reasons to avoid foreign keys. I just stick with postgresql.

richardwhiuk · on Dec 21, 2023

sharding is still a big problem for foreign keys

shayonj · on Dec 21, 2023

ah! thats a blast from the past. I maintain pg-osc (online schema change tool for postgres) and very much agree that FKs make OSC hard.

grep_it · on Dec 21, 2023

The responder to that issue has also written some blog posts that go into more detail on the subject.

* https://code.openark.org/blog/mysql/things-that-dont-work-we...

* https://code.openark.org/blog/mysql/the-problem-with-mysql-f...

endisneigh · on Dec 21, 2023

You don’t need foreign keys, no. But you do need referential integrity and foreign keys implemented and enforced by the database is usually the easiest.

If you’re disciplined it’s not difficult to implement yourself in the application, but the challenge is if folks can access the database directly. If so, good luck, since it’s inevitable that they end up making modifications and break integrity.

If one insists on not having foreign keys (and there are decent reasons to not have them), I would really really suggest not allowing direct access to the database and enforce whatever application be the main or only client to the database.

wharvle · on Dec 21, 2023

Foreign keys are wholly optional if you’re the kind of person who thinks it’d be cool to add “manually repair millions of rows” and/or “lose half a day of data; cry” to the todo list for a 3AM emergency deployment rollback.

nonethewiser · on Dec 21, 2023

What would be the advantage of implementing it in the application?

I only see disadvantages:

- no data integrity check in the database

- more complicated definition of foreign relation, or none at all, leaving possibility for deviant data

- scatteted schema. Cant look at the db table and understand the entire model. Have to hunt in code an potentially across a lot of code.

It just seems like a data integrity issue that needs to be enforced at the db level.

cogman10 · on Dec 21, 2023

The main advantage is it's WAY faster if you are writing a lot.

The big disadvantage of foreign keys is they do verify integrity. That means making lookups on every write/update which can be very costly especially as the model becomes more complex.

If you have a read heavy application with low levels of writes then by all means put in foreign keys to you hearts content. But if you are at a point where you have millions or billions of records in multiple tables, they simply aren't feasible.

mavelikara · on Dec 21, 2023

But you pay the cost in checking it in the application, as GP said. If so, it simply is moving the cost from db to application layer. Is there a reason the checks can be implemented more efficiently in the application than the DB can?

butlerm · on Dec 21, 2023

Referential integrity problems usually happen due to missing deletes, improper deletes, or references that should be cleared.

The overhead of checking for the existence of referred to records in ordinary inserts and updates in application code is unnecessary in most cases, and that is where the problem is. Either you have to check to have any idea what is going on, because your key values are being supplied from an outside source or you should be able write your application so that it does not insert random references into your database.

If you actually need to delete a row that might be referred to, the best thing to do is not to do that, because you will need application level checks to make the reason why you cannot delete something visible in any case. 'Delete failed because the record is referred to somewhere' is usually an inadequate explanation. The application should probably check so that delete isn't even presented as an option in cases like that.

setr · on Dec 21, 2023

> If you actually need to delete a row that might be referred to, the best thing to do is not to do that, because you will need application level checks to make the reason why you cannot delete something visible in any case. 'Delete failed because the record is referred to somewhere' is usually an inadequate explanation. The application should probably check so that delete isn't even presented as an option in cases like that.

I feel like this belongs to the same strategy as duplicating form-validation on frontend/backend. The frontend validations can't be trusted (they can be skipped over with e.g. curl POST), so backend validation must be done. But you choose duplicate it to the frontend for user-convenience / better reporting / faster feedback loop. The backend remains the source of truth on validations.

The same between database and application; the database is much more likely to be correct when enforcing basic data constraints and referential integrity. The application can do it, its just a lot more awkward because they're also juggling other things and have a higher-level view of the data (and the only real way to check you didn't screw up is to make your testcase do exactly the same thing... but be correct about it -- no one else is going to tell you your dataset got fucked. Also true in an RDBMS, but it's trivial to verify by eye, and there's only one place to check per relationship). Thus in my world-view, the database must validate, and the application can choose to duplicate validation for user-convenience / better reporting. The database remains the source of truth on validations. As an optimization, you remove the database validations, but at your own risk.

And then in a multi-app, single db world, then you really can't trust the application (validations can be skipped), so even that optimization is likely illegal. Or you do many-apps *-> single-api -> db, and maintain the optimization at the cost of pretty much completely dropping the flexibility of having an RDBMS in the first place

endisneigh · on Dec 21, 2023

The reason is because the application knows what’s actually happening with the data.

LtWorf · on Dec 21, 2023

It's supposed to know, but it actually doesn't.

At my first job they used mongodb for no reason other than it was the fad at the moment, with the big data and so on.

There were often crashes in the application because our records followed several different schemas, because of bugs in the application that were later fixed, behaviour that got changed, ORM got replaced with hand written code that was much faster…

Basically, what happens with over confident developers and CTO that think they are very good developers and aren't.

endisneigh · on Dec 21, 2023

There are plenty of counter examples. It’s simply a fact that your application can be faster without foreign keys.

mavelikara · on Dec 21, 2023

Your argument, if I understand it correctly, is that the constraints can be implemented in the application layer more efficiently - either in dev time or CPU time - than it can be in the DB layer. Why is that so, and under what circumstances?

(Asking to learn, not to argue)

majkinetor · on Dec 21, 2023

Yes there is although way more complex.

U could use caching so not to hit db often and also app scales way easier than db.

dkjaudyeqooe · on Dec 21, 2023

This is your typical false economy trading a very small performance gain for strongly degraded data integrity.

If the indexes are too much to ask you're basically saying that the referred to foreign records are never looked up (they have no key index!) and/or that the foreign records are never gathered up for referring table. If that's the case the problem isn't having superfluous indexes it's that you have superfluous data in your database.

cogman10 · on Dec 21, 2023

> a very small performance gain for strongly degraded data integrity.

It's not very small. Doing a lookup on every write can be hugely detrimental to performance especially with large numbers of writes.

> If the indexes are too much to ask you're basically saying that the referred to foreign records are never looked up (they have no key index!)

That does not follow. The lookup for foreign records is not free so avoiding doing it when you don't need to will gain faster performance vs doing it all the time. Indexes make lookups faster, they don't make them free.

Further, you have to consider the impact of locks on such a system. Writes to a table that references another will lock the contents of the second table while the write is in flight. So if I wanted to update the foreign record in any way, that task now gets blocked until data integrity check finishes.

In MSSQL, that lock is held until the end of the transaction.

ndriscoll · on Dec 21, 2023

Batch processing is also WAY faster (like 1-2 orders of magnitude), but for whatever reason people don't talk about it much (cynically, I guess because it's too straightforward and old-hat). As far as I can tell, Rails doesn't have multithreading/async, so the OP is presumably not batching requests. That seems like a much better place to start than giving up referential integrity.

cogman10 · on Dec 21, 2023

We are batch processing. I don't know why you assume we aren't. FK constraints are simply too much burden for us. I'm not the OP, however, my company has experienced the performance headaches of FKs.

ndriscoll · on Dec 21, 2023

I didn't assume anything about you; I said OP uses a framework that (as far as I know) makes it difficult to batch requests, so they're probably not batching requests. And they mention doing a `before_create` hook, which again suggests they're operating on individual models. If you're already batching and still running into a wall, then yeah you might need to remove constraint validation or shard. But batching alone can probably take most people far past the scale they'll ever see.

randomdata · on Dec 21, 2023

> What would be the advantage of implementing it in the application?

It is always implemented in the application. The question is about whether you do it in your own application or use someone else's application.

The disadvantage of doing it in your application is that you have to do the work that someone else has probably already done, and are likely not as great of a programmer as that other person, thus more likely to screw it up.

endisneigh · on Dec 21, 2023

It’s faster, you can do online schema migrations, but it’s hard to say unless a specific database is used as an example.

dkjaudyeqooe · on Dec 21, 2023

It's basically like saying you're ok with dangling pointers or orphaned records, or that you're awesome enough not to need these "slow", annoying safety features.

That's all fine, but we know how this ends up in reality.

Checking that child records that can exist only if there is a parent record are deleted when the parent is deleted and that the foreign key you're using in a record actually exists in the pointed-to table, for the price of a single index you very likely need anyway, is hardly a problem.

You don't need to cascade deletes or anything, that's a red herring, as is migration issues where you can turn off referential integrity while the process is performed.

cosmotic · on Dec 21, 2023

Using an intermediate client would go against many of the claimed performance benefits of not using the FK. One would also likely lose the ACI from ACID. Replicating those features in the intermediate client sure is risky; it would require significant expertise. I'd imagine the implementation would leverage transactions, which would be 'round-trip' to the database server and thus the lock contention would still exist and also be substantially worse (due to the round-trip latency).

bjhartin · on Dec 21, 2023

> but the challenge is if folks can access the database directly

This is not the challenge. The challenge is that you think you can do an 'almost-as-good' job of data integrity as the RDBMS designers.

Even if you could, you will not always be the person that maintains that code.

shayonj · on Dec 21, 2023

That is great and I noticed it’s something I have always assumed should be the case - no indirect access to the database outside your application / ORM. Because then you open a can of worms.

out_of_protocol · on Dec 21, 2023

Doesn't help much though. It only takes not wrapping related inserts into transaction in one place in your code. i know zero frameworks/ORMs that can help you with that

tacone · on Dec 21, 2023

I usually wrap everything (including reads) in a transaction at middleware level. Not perfect, has its downsides but for many projects it's perfectly ok.

Of course it's not a real substitute for foreign keys, but definitely better than nothing.

zer00eyz · on Dec 21, 2023

Ahh the old days of LAMP when the M was mysql and foreign keys were just a dream...

When you reduce complexity and take off the safeguards things get faster! Cock that foot gun and hope that it doesn't go off!

Can you do what the author suggests. You sure can and we did it for a long time with MYSQL. Should you? It depends on your team, how in tune they are with working with databases, sql etc...

Clubber · on Dec 21, 2023

There was a funny NoSql video that said something to the effect of, "if you are willing to sacrifice everything for speed, just pipe all your data to /dev/null."

Found it: https://www.youtube.com/watch?v=b2F-DItXtZs

wombatpm · on Dec 21, 2023

If you are willing to forego 3rd normal form and just duplicate data like mad, and have a million column like a giant excel sheet, then you really don’t need foreign keys.

zer00eyz · on Dec 21, 2023

MyISAM I think was the table type that did NOT support FK constraints and was blazing fast. InnoDB had FK constraints.

You would fully normalize, use proper joins etc. But without FK constraints you could end up with orphan data... Not the worst thing in the world, depending on how the joins in your system were structured.

Care and diligence were the order of the day. You had to know your schema and make sure you were doing the RIGHT thing at all points in the stack.

doix · on Dec 21, 2023

If you duplicate data like mad, how do you ensure that the duplicates are in sync? If you avoid UPDATEs I could maybe see it working.

karmakaze · on Dec 21, 2023

Large scale MySQL databases I've worked on typically do not use foreign keys. The payoff is higher write performance. They're implicit based on table/column naming and relationships defined in code, e.g. Rails associations or process/operation classes.

With a certain level of team maturity and thoughtful reviews, this has rarely been an issue. Sometimes there are orphaned rows (from incomplete/buggy writes) which also have clean-ups that tend to have jobs for pruning data that has gone 'out of the retention window'.

sgarland · on Dec 21, 2023

Depends on your definition of large-scale, but I managed a 120K QPS MySQL 8.x cluster that heavily used FKs. After heavy optimization and parameter tuning, I was also able to halve the instance size while improving query performance.

I’m a big fan of FKs. They stop you from doing stupid shit, and if the time ever truly comes that you have to drop them, you can do so.

marcosdumay · on Dec 21, 2023

> this has rarely been an issue

I have seen many people running MySQL make that claim about lots of things... But what I have never seen is a MySQL database for business data without major issues.

pphysch · on Dec 21, 2023

I've been on both sides of this fence.

I work on a legacy MySQL database that has 100+ moderatedly-related tables with zero foreign key constraints.

Someone wanted to change the name of a particular object, so I did a little investigation and found that approximately two dozen tables were storing this name-as-key, with at least half a dozen different column names. So essentially we have to scan every row in the entire database to robustly maintain referential integrity, and I vetoed the largely cosmetic change. Granted, this isn't 100% a FK constraint issue, but I imagine if the original DBA knew what FKs were, I wouldn't be in this mess.

On the other hand, I have implemented prototypes that clearly had way too many FK constraints, which leads to a bloated schema of unnecessary tables with 1-2 columns each, and makes object lifecycles a spaghetti nightmare.

charlie0 · on Dec 21, 2023

Having worked in a db with lots of missing FKs, it becomes very difficult to figure out relationships. This makes it difficult to find out what tables need to be updated when there's a lot of them. In this particular company, the only thing holding it together was institutional knowledge.

betenoire · on Dec 21, 2023

I worked for a company where we would not use features that depended on foreign keys (cascading deletes, etc.), but we would define the foreign keys 100% of the time. We ran all of our staging environments with the keys, but when it came to prod our DBA would remove them, among other optimizations. I've not done this at other places that didn't have a dedicated DBA or two, though.

rpgmaker · on Dec 21, 2023

50 years onward and every innovation seems to break, rather than improve, the tried and true relational database model.

arrowleaf · on Dec 21, 2023

Implementations with skin in the game are still innovating and improving the 'tried and true' relational database model, IMO. PostgreSQL, MS SQL, MySQL, etc. consistently churn out new features that actually make sense, like FK constraint support for sharded/partitioned tables relevant to this discussion.

simonw · on Dec 21, 2023

If you work on a large scale system that doesn't include foreign key constraints, try running some queries that should never return results because that would expose data integrity problems.

If your system is hooked up to a data warehouse you can run queries there, too.

I bet you can find all sorts of weird edge-case records doing this.

cube2222 · on Dec 21, 2023

On that note, has anybody figured out a nice way to combine foreign keys and soft deletion (that is, a deleted_at column)?

Soft deletion is occasionally useful, but losing foreign keys for it is a big pain.

EDIT: got a bunch of responses, thanks! To be clear, the issue I have in mind is e.g. you want to have a foreign key that makes sure the “singular” side of a one-to-many relationship isn’t soft-deleted (on delete restrict) as long as it has anything in the “many” side. Both sides using soft-deletion.

arrowleaf · on Dec 21, 2023

By soft deletion, do you mean you would want to ever re-instate the record and make it valid again, PK/FK constraints and all? Or tracking deleted records? For the latter I use audit tables + triggers to track the changing values.

You can optimize it to not duplicating column values unless necessary. On inserts you only need the metadata of the action. On updates, the old value of columns with changes goes into the audit table. On deletes the whole record goes into the audit table.

PeterisP · on Dec 21, 2023

Just brainstorming - perhaps one way to do that would be a multi-column FK towards the row ID combined with an is_deleted column, which would mean that the FK constraint also enforces updating the related rows if something is deleted.

yen223 · on Dec 21, 2023

The `deleted_at` column approach to soft deletion has enough downsides that I would choose to just move "deleted" records into a different table.

Having a separate table for deleted records means that

- FK references to the main table will just work

- the main table can be kept clean.

- it avoids the class of problems where a user or the application treats soft-deleted records as real records, because they weren't aware of the `deleted_at` column.

michaelmior · on Dec 21, 2023

I see at least two possible approaches (each with their tradeoffs, neither great):

1. For each relevant foreign key, have an additional column without a constraint where the foreign key value can be copied to before deletion. (This requires that the original foreign key be nullable.)

2. Make soft deletion universal so that foreign keys can remain and any associated rows will still exist.

stop50 · on Dec 21, 2023

How are foreign keys and deleted_at mutual exclusive?

dspillett · on Dec 21, 2023

I assumed cube2222 meant where soft-delete wasn't universal: so you might really delete the parent row but want to soft-delete the child rows. Though I would suggest this is a bad idea: if you want to soft-delete child rows you should enable it on parent rows too.

cube2222 · on Dec 21, 2023

E.g. you want to have a foreign key that makes sure the “singular” side of a one-to-many relationship isn’t soft-deleted as long as it has anything in the “many” side. Both sides using soft-deletion.

_a_a_a_ · on Dec 21, 2023

A better post than I expected. The only thing I'd add is that foreign keys can actually improve read times because the optimiser knows it can safely skip certain joins e.g. if you have inner equi-joins between tables a, b and c

    a join b join c

If there is an FK from a to b, and likewise from b to c, and you don't use anything in b, then the optimiser can rewrite this to

    a join c

YMMV

senderista · on Dec 21, 2023

Yes, I remember using foreign key relationships to optimize symmetric hash join back when I worked on an OLAP db. The idea was that for a 1:1 relationship you can immediately discard both sides of a joined tuple, while for a 1:many relationship you can immediately discard the "many" side of a joined tuple.

_a_a_a_ · on Dec 21, 2023

thanks for the detail!

eska · on Dec 21, 2023

I wonder whether there’s a measurable performance difference if instead of calling delete two times he just used the second delete with “on delete cascade”?

iamflimflam1 · on Dec 21, 2023

Whenever I read posts like this I know, despite all the caveats at the start of the article, a large number of people will take away "we shouldn't have any foreign keys because they impact performance".

I suggest that anyone thinking that has a look at a database where there are no foreign keys.

yen223 · on Dec 21, 2023

Looking at the comments here, almost everyone here are saying "you must ALWAYS use foreign key constraints", which I think is also not correct.

knorker · on Dec 21, 2023

I used MySQL before foreign key support.

It was a nightmare. The article seems to basically be saying "but it's hard!". Well, I'd rather put down the extra effort so that I don't have corrupt data.

You know what's really hard? Fixing corrupt data.

The data is the most important thing in a database.

mastazi · on Dec 21, 2023

Despite the title, this article is about dropping foreign key constraints, not about dropping foreign keys altogether (which is possible, see below).

You can drop foreign key constraints and enforce referential integrity in your code, but from a logical standpoint your foo.barId column is still a foreign key.

Actually getting rid of foreign keys on the other hand can be done in a data store that allows nested structures, like e.g. a document based store. The price to pay is data duplication and denormalisation (and the related complexity you will have to deal with in all write operations), the pros are ease and speed of retrieval.

bhaak · on Dec 21, 2023

I’m missing the execution plans of these DELETE statements.

Without them it‘s not clear what‘s the problem.

But in any case sacrificing referential integrity for speed is a bad idea. You could go NoSQL in that case anyway.

carcabob · on Dec 22, 2023

Novice Question: How can you have a relational database at all without foreign keys?

In other words, if your tables aren't referencing other tables, how do you perform JOINS for your queries? From my limited experience, they seem pretty essential for most databases. Or is the article just talking about foreign key CONSTRAINTS and cascading UPDATES and DELETES, rather than literally talking about the values stored in foreign key columns?

If anyone can explain, I would greatly appreciate it.

sseagull · on Dec 22, 2023

Foreign keys are really just constraints - the row referred to by the FK must exist, and the database will ensure that. That is the basic idea. Cascading, etc are just useful features on top.

While FKs and JOINs often go hand in hand, JOINs actually do not require foreign keys or any other constraint. You can join any table and column that you want regardless of FK or index or whatever (although you have to be careful about performance).

jelled · on Dec 21, 2023

I see foreign keys the same way I see types. You may not need them, but they catch a lot of mistakes and give you more confidence in the integrity of your system.

davidw · on Dec 21, 2023

Start with them, and if you're lucky enough to scale to a point where they're hurting things, reconsider.

arrowleaf · on Dec 21, 2023

And then, don't just drop the constraints because you read an article about how FK constraints don't work with a distributed relational model, take a careful look at the docs of whatever DB engine you're using. PostgreSQL 12 for example supports foreign keys with either side being a partitioned table.

redandblack · on Dec 22, 2023

Is 3rd normal not something used in design? Obviously, I have dropped off more than 20 years but distinctly remember that design saving our product evolution over 10 years or more - really a working early 90's designed system in 00's

forinti · on Dec 21, 2023

A colleague once insisted in not using FKs because he read somewhere that that was the modern way of doing things.

So the universe decided to punish him and a few weeks later some data got deleted that would have been protected by an FK.

This idea is no longer discussed.

prudentpomelo · on Dec 21, 2023

Referential integrity saves you so many headaches. I am dealing with a 500+ table database with no foreign keys at all. This has allowed so many problems to creep in. Writing queries is pure suffering.

ww520 · on Dec 21, 2023

The performance gain by dropping foreign keys doesn’t hold water. You still have to do the referential checks in application or in the ORM code. Unless you meant dropping referential checks.

arp242 · on Dec 21, 2023

Here' the common scenario:

  obj = new_object()
  obj.col1 = get_value_from_somewhere()
  obj.user_id = get_logged_in_user_id()
  obj.insert()

Or:

  # Find correct ID.
  obj.some_id = run_query("select some_id from tbl where x=?", param)

Lots of variation on that, but the user_id and some_id here are pretty much guaranteed to be accurate when implemented correctly. The biggest potential issue might be race conditions with deletes on the parent ID, but just having soft deletes sufficiently alleviates that (potential) small issue.

ww520 · on Dec 22, 2023

What you picked are the happy paths.

Here's the common scenario:

  deals = sql("select * from current_deals ...");
  // Show the user the deals.
  // User takes his sweet time to ponder over the deals.
  // Meanwhile, the deals are gone.
  order = new_order()
  order.deal_id = deals[5].id
  order.insert()
  // User calls in to complain never receiving the deal.

  // Ok, need to verify the deal still exists.
  sql("select deal from current_deals where id = ? for update", deals[5].id)
  ... insert ...
  sql("commit")

That's second select..for update is referential checking in code.

arrowleaf · on Dec 21, 2023

The only time I could see it really impacting performance is during bulk operations. But that's where making FK constraints deferred and wrapping the bulk actions in a transaction makes more sense.

magicalhippo · on Dec 21, 2023

Might be a performance penalty as well.

We had some missing FK constraints, along with missing cleanup in code for a given table. Some customers had many millions of orphaned rows, while only thousands of live rows.

Certainly didn't help index scans...

crazygringo · on Dec 21, 2023

Not really -- in practice, it generally means all inserts are done as a transaction, and deletes simply aren't done.

Then there isn't really any opportunity for inconsistent data.

jbverschoor · on Dec 21, 2023

Well, do we really need protected memory and bounds checking?

BenoitP · on Dec 21, 2023

My rule of thumb has been: enable them strictly in DEV and INT environments, disable in PROD. They can catch schema discrepancies, but can impede ingestion rates.

Also some referential errors are sort of ok in PROD, as long as it's only about not dropping user data; which can be dealt with later on (INT gets reset with PROD user data from a backup each week, it also helps in the restore plan, fk are enabled, errors are caught, then data gets pruned heavily)

If referential integrity is a business-level bug, then of course we should enable them.

somehnguy · on Dec 21, 2023

To me that seems like an invitation for catastrophic bugs or problems to creep through and surprise you in prod, since the lower envs are running different a different configuration.

I suppose it's entirely dependent on the type of data you're working with though.

LtWorf · on Dec 21, 2023

> as long as it's only about not dropping user data

Can you tell the name of the company? It's for my lawyer.

whynotmaybe · on Dec 22, 2023

Whatever you do, always have your dev/test environment identical to production.

Have a load balancer or replication in prod? You must have the same when you test.

Otherwise, you will have a bad time m'kay.

And don't get me started about removing fk constraint, anybody doing that on purpose is either very, very smart, either very, very ignorant/inexperienced.

shayonj · on Dec 21, 2023

Very interesting! Thanks for sharing

HideousKojima · on Dec 21, 2023

>My rule of thumb has been: enable them strictly in DEV and INT environments, disable in PROD

I mean I think the constraints should be on in all environments, but disabling them in prod but not dev seems utterly backwards? Protect your test data from getting corrupted but not your actual customer data?

solumunus · on Dec 22, 2023

I assume the idea is that data corruption issues will be spotted in dev testing.

HideousKojima · on Dec 22, 2023

That sounds like a terrible assumption to stake your business on though, especially for what are really marginal gains for the 99% of companies who don't need the tiny performance boost from not having foreign keys.

dventimi · on Dec 23, 2023

"Do you really need [foreign key constraints]?" FTFY.

Foreign keys != Foreign key constraints

cyberninja15 · on Dec 21, 2023

nice, but I wonder if a query optimizer within the database might be the more optimal solution.

belltaco · on Dec 21, 2023

Do you really need airbags and safety belts? They add a lot of weight, cost and occupy space.

ceeam · on Dec 21, 2023

Yes.

And if you're concerned for microspeed of basic operations Postgres isn't your friend anyway.

mikl · on Dec 21, 2023

Do you really need referential integrity?

Do you really need consistent data?

Do you really need test coverage?

I mean, you can make do without it. But it’s generally better to have it than not.

binarymax · on Dec 21, 2023

How often is data actually DELETEd from production databases? Unless following through with regulatory removal such as GDPR, it’s far better to use an isdeleted flag IMO (so you can un-delete if the action was a mistake).

arp242 · on Dec 21, 2023

There's also an impact on inserts; it's not just deletes.

Essentially a foreign key constraint is an on {insert,update,delete} trigger which checks whether the target check exists, so that's a select on the target table. I'm not sure if that's still the case, but I believe that for a long time foreign keys were just implemented as triggers in PostgreSQL.

For a lot of things that has a minimal performance impact and it's not a big deal. For some other things it can really add up.

binarymax · on Dec 21, 2023

FKs are typically primary keys/clustered indices on the related table, so in those cases the integrity overhead would be very minimal.

So in the vast majority of cases, the integrity is worth it.

arp242 · on Dec 21, 2023

I assume you mean "performance overhead"? For insert-heavy workloads the difference very much is not "very minimal". In a quick test it's ~100ms vs. ~20ms with two foreign keys which check against a "countries" table (250 rows) – one column for nationality and one for residence.

±a lot because this is a very quick test I ran just twice, but it fits what I measured before when I very much ran in to this performance penalty a few years back.

And look, 20ms vs. 100ms is very much fine for a lot of use cases. But it's also not for a lot of others. And it's certainly not very minimal, and with many inserts it does amplify a lot.

pphysch · on Dec 21, 2023

That's true, but managing the "soft" referential integrity becomes a big authorization/security headache. How do you prevent/allow "deleted" objects from being accessed indirectly?

I like Django's ORM approach which allows you to easily set baseline filters for your Model Managers, so you can implicitly exclude "deleted" objects from most queries.

binarymax · on Dec 21, 2023

Good point. I’m one of those “never-ORM” people, so writing & maintaining the SQL is prone to error when the DB gets complex, and it does introduce mental overhead during dev.

pif · on Dec 21, 2023

At last, a counterexample to Betteridge's law of headlines!

toasted-subs · on Dec 21, 2023

Something about a secret racist thing about the word "foreign". Personally I wouldn't mind working with more Spanish, Italian, and French people.

Not complaining about the types of coworkers I see, but sometimes tapas are kind of nice.