This is great, but what possible counterargument is there? We should prolong ind...

jerf · 2024-11-16T20:57:06 1731790626

Easy: Short term risk versus long term risk. If I deploy with minimal changes today, I'm taking a non-zero short-term risk for zero short-term gain.

While I too am generally a long-term sort of engineer, it's important to understand that this is a valid argument on its own terms, so you don't try to counter it with just "piffle, that's stupid". It's not stupid. It can be shortsighted, it leads to a slippery slope where every day you make that decision it is harder to release next time, and there's a lot of corpses at the bottom of that slope, but it isn't stupid. Sometimes it is even correct, for instance, if the system's getting deprecated away anyhow why take any risk?

And there is some opportunity cost, too. No matter how slick the release, it isn't ever free. Even if it's all 100% automated it's still going to barf sometimes and require attention that not making a new release would not have. You could be doing something else with that time.

macintux · 2024-11-16T20:33:52 1731789232

In some environments, deploying to production has a massive bureaucracy tax. Paperwork, approvals, limited windows in time, can’t do them during normal business hours, etc.

josho · 2024-11-16T22:18:55 1731795535

Those taxes were often imposed because of past engineering errors. For example, Don't deploy during business hours because a past deployment took down production for a day.

A great engineering team will identify a tax they dislike and work to remove it. Using the same example, that means improving the success rate of deployments so you have the data (the success record) to take to leadership to change the policy and remove the tax.

ukuina · 2024-11-16T20:18:59 1731788339

Finite compute, people, and opportunity cost.

It is just a reframing of build vs maintain.

rconti · 2024-11-16T21:46:46 1731793606

The counterargument is obvious for anyone who has been on call or otherwise responsible for system stability. It's very easy to become risk-averse in any realm.

andai · 2024-11-16T23:27:05 1731799625

Doesn't ensuring stuff actually works tangibly lower risk?

mbrumlow · 2024-11-17T03:40:41 1731814841

Yes. Because it lowers the chance compound risk. The longer you go without stressing the system the more likely you will have a double failure, thus increasing your outage duration.

Simply put. You don’t want to delay funding out something is broke, you want to know the second it is broken.

The the case I am suggesting, a failed release will be often deploying the same functionality, thus many failure modes will result in zero outage. It all failure modes will result in an outage.

When the software is expected to behave differently after the deployment, more systems can result in being part of the outage. Such as the new systems can’t do something or the old systems can’t do something.

Jach · 2024-11-17T03:34:10 1731814450

Not exactly, but it's worth the experiment in trying things anyway. Say you currently have a release once every few months, an ambitious goal would be to get to weekly releases. Continuous enough by comparison. But 'lower risk' is probably not the leading argument for the change, especially if the quarterly cycle has worked well enough, and the transition itself increases risk for a while. In order for a committed attempt to not devolve into a total dumpster fire, various other practices will need to be added, removed, or changed. (For example, devs might learn the concept of feature flags.) The individuals, which include management, might not be able to pull it off.

mplewis · 2024-11-16T20:16:26 1731788186

The common and flawed counterargument is “when we deploy, outages happen.” You’ll hear this constantly at companies with bad habits.

kortilla · 2024-11-16T20:55:27 1731790527

Deploying is expensive for some models. That could involve customer facing written release notes, etc. Sometimes the software has to be certified by a govt authority.

Additionally, refactor circle jerks are terrible for back-porting subsequent bug fixes that need to be cherry picked to stable branches.

A lot of of the world isn’t CD and constant releases are super expensive.