> In the hardware space, it took months to years to provision new machines or up...

> In the hardware space, it took months to years to provision new machines or upgrade OSes.

If it takes this long to manage a machine, I strongly suspect it means that when initially designing the system engineers had failed to account for those for some reason. Was that true in your case?

Back in late '00s until mid '10s, I worked for an ISP startup as a SWE. We had a few core machines (database, RADIUS server, self-service website, etc) - ugly mess TBH - initially provisioned and originally managed entirely by hand as we didn't knew any better back then. Naturally, maintaining those was a major PITA, so they sat on the same dated distro for years. That was before Ansible was a thing, and we haven't really heard about Salt or Chef before we started to feel the pains and started to search for solutions. Virtualization (OpenVZ, then Docker) helped to soften a lot of issues, making it significantly easier to maintain the components, but the pains from our original sins were felt for a long time.

But we also had a fleet of other machines, where we understood our issues with the servers enough to design new nodes to be as stateless as possible, with automatic rollout scripts for whatever we were able to automate. Provisioning a new host took only a few hours, with most time spent unpacking, driving, accessing the server room, and physically connecting things. Upgrades were pretty easy too - reroute customers to another failover node, write a new system image to the old one, reboot, test, re-route traffic back, done.

So it's not like self-owned bare metal is harder to manage - the lesson I learned is that one just gotta think ahead of time what the future would require. Same as the clouds, I guess, one has to follow best practices or they'll end up with crappy architectures that will be painful to rework. Just different set of practices, because of the different nature of the systems.