Hacker News new | past | comments | ask | show | jobs | submit login

I’m not convinced this is entirely true. The upfront cost if you don’t have the skills, sure – it takes time to learn Linux administration, not to mention management tooling like Ansible, Puppet, etc.

But once those are set up, how is it different? AWS is quite clear with their responsibility model that you still have to tune your DB, for example. And for the setup, just as there are Terraform modules to do everything under the sun, there are Ansible (or Chef, or Salt…) playbooks to do the same. For both, you _should_ know what all of the options are doing.

The only way I see this sentiment being true is that a dev team, with no infrastructure experience, can more easily spin up a lot of infra – likely in a sub-optimal fashion – to run their application. When it inevitably breaks, they can then throw money at the problem via vertical scaling, rather than addressing the root cause.




I think this is only true for teams and apps of a certain size.

I've worked on plenty of teams with relatively small apps, and the difference between:

1. Cloud: "open up the cloud console and start a VM"

2. Owned hardware: "price out a server, order it, find a suitable datacenter, sign a contract, get it racked, etc."

Is quite large.

#1 is 15 minutes for a single team lead.

#2 requires the team to agree on hardware specs, get management approval, finance approval, executives signing contracts. And through all this you don't have anything online yet for... weeks?

If your team or your app is large, this probably all averages out in favor of #2. But small teams often don't have the bandwidth or the budget.


I work for a 50 person subsidiary of a 30k person organisation. I needed a ___domain name. I put in the purchase request and 6 months later eventually gave up, bought it myself and expensed it.

Our AWS account is managed by an SRE team. It’s a 3 day turnaround process to get any resources provisioned, and if you don’t get the exact spec right (you forgot to specify the iops on the volume? Oops) 3 day turnaround. Already started work when you request an adjustment? Better hope as part of your initial request you specified backups correctly or you’re starting again.

The overhead is absolutely enormous, and I actually don’t even have billing access to the AWS account that I’m responsible for.


> 3 day turnaround process to get any resources provisioned

Now imagine having to deal with procurement to purchase hardware for your needs. 6 months later you have a server. Oh you need a SAN for object storage? There goes another 6 months.


At a previous job we had some decent on prem resources for internal services. The SRE guys had a bunch of extra compute and you would put in a ticket for a certain amount of resources (2 cpu, SSD, 8GB memory x2 on different hosts). There wasn’t a massive amount of variability between the hardware, and you just requested resources to be allocated from a bunch of hypervisors. Turnaround time was about 3 days too. Except, you were t required to be self sufficient in AWS terminology to request exactly what you needed .


> Our AWS account is managed by an SRE team.

That's an anti-pattern (we call it "the account") in the AWS architecture.

AWS internally just uses multiple accounts, so a team can get their own account with centrally-enforced guardrails. It also greatly simplifies billing.


That’s not something that I have control over or influence over.


Manageability of cloud without a dedicated resource is a form of resource creep, and shadow labour costs that aren’t factored in.

How many things don’t end up happening because of this? When they need a sliver of resources in the start?


You're assuming that hosting something in-house implies that each application gets its own physical server.

You buy a couple of beastly things with dozens of cores. You can buy twice as much capacity as you actually use and still be well under the cost of cloud VMs. Then it's still VMs and adding one is just as fast. When the load gets above 80% someone goes through the running VMs and decides if it's time to do some house cleaning or it's time to buy another host, but no one is ever waiting on approval because you can use the reserve capacity immediately while sorting it out.


The SMB I work for runs a small on-premise data center that is shared between teams and projects, with maybe 3-4 FTEs managing it (the respective employees also do dev and other work). This includes self-hosting email, storage, databases, authentication, source control, CI, ticketing, company wiki, chat, and other services. The current infrastructure didn’t start out that way and developed over many years, so it’s not necessarily something a small startup can start out with, but beyond a certain company size (a couple dozen employees or more) it shouldn’t really be a problem to develop that, if management shares the philosophy. I certainly find it preferable culturally, if not technically, to maximize independence in that way, have the local expertise and much better control over everything.

One (the only?) indisputable benefit of cloud is the ability to scale up faster (elasticity), but most companies don’t really need that. And if you do end up needing it after all, then it’s a good problem to have, as they say.


Your last paragraph identifies the reason that running their own hardware makes sense for Fastmail. The demand for email is pretty constant. Everyone does roughly the same amount of emailing every day. Daily load is predictable, and growth is predictable.

If your load is very spiky, it might make more sense to use cloud. You pay more for the baseline, but if your spikes are big enough it can still be cheaper than provisioning your own hardware to handle the highest loads.

Of course there's also possibly a hybrid approach, you run your own hardware for base load and augment with cloud for spikes. But that's more complicated.


I’ve never worked at a company with these particular problems, but:

#1: A cloud VM comes with an obligation for someone at the company to maintain it. The cloud does not excuse anyone from doing this.

#2: Sounds like a dysfunctional system. Sure, it may be common, but a medium sized org could easily have some datacenter space and allow any team to rent a server or an instance, or to buy a server and pay some nominal price for the IT team to keep it working. This isn’t actually rocket science.

Sure, keeping a fifteen year old server working safely is a chore, but so is maintaining a fifteen-year-old VM instance!


The cloud is someone else’s computer.

Having redirected of a vm provider or installing a hyper visor on equipment is another thing.


Obligation? Far from it. I've worked at some poorly staffed companies. Nobody is maintaining old VMs or container images. If it works, nobody touches it.

I worked at a supposedly properly staffed company that had raised 100's of millions in investment, and it was the same thing. VMs running 5 year old distros that hadn't been updated in years. 600 day uptimes, no kernel patches, ancient versions of Postgres, Python 2.7 code everywhere, etc. This wasn't 10 years ago. This was 2 years ago!


There is a large gap between "own the hardware" and "use cloud hosting". Many people rent the hardware, for example, and you can use managed databases which is one step up than "starting a vm".

But your comparison isn't fair. The difference between running your own hardware and using the cloud (which is perhaps not even the relevant comparison but let's run with it) is the difference between:

1. Open up the cloud console, and

2. You already have the hardware so you just run "virsh" or, more likely, do nothing at all because you own the API so you have already included this in your Ansible or Salt or whatever you use for setting up a server.

Because ordering a new physical box isn't really comparable to starting a new VM, is it?


I've always liked the theory of #2, I just haven't worked anywhere yet that has executed it well.


Before the cloud, you could get a VM provisioned (virtual servers) or a couple of apps set up (LAMP stack on a shared host ;)) in a few minutes over a web interface already.

"Cloud" has changed that by providing an API to do this, thus enabling IaC approach to building combined hardware and software architectures.


You have omitted the option between the two, which is renting a server. No hardware to purchase, maintain or set up. Easily available in 15 minutes.


While I did say "VM" in my original comment, to me this counts as "cloud" because the UI is functionally the same.


3. "Dedicated server" at any hosting provider

Open their management console, press order now, 15 mins later get your server's IP address.


For purposes of this discussion, isn't AWS just a very large hosting provider?

I.e. most hosting providers give you the option for virtual or dedicated hardware. So does Amazon (metal instances).

Like, "cloud" was always an ill-defined term, but in the case of "how do I provision full servers" I think there's no qualitative difference between Amazon and other hosting providers. Quantitative, sure.


> Amazon (metal instances)

But you still get nickel & dimed and pay insane costs, including on bandwidth (which is free in most conventional hosting providers, and overages are 90x cheaper than AWS' costs).


Qualitatively, AWS is greedy and nickle and dime you to death. Their Route53 service doesn't even have all the standard DNS options I need and I can get everywhere else or even on my own running bind9. I do not use IPv6 for several reasons, when AWS decided charge for IPv4, I went looking elsewhere to get my VM's.

I can't even imagine how much the US Federal Government is charging American taxpayers to pay AWS for hosting there, it has to be astronomical.


Out of curiosity, which DNS record types do you need that Route53 doesn't support?


More like 15 seconds.


You gave me flashbacks to a far worse bureaucratic nightmare with #2 in my last job.

I supported an application with a team of about three people for a regional headquarters in the DoD. We had one stack of aging hardware that was racked, on a handshake agreement with another team, in a nearby facility under that other team's control. We had to periodically request physical access for maintenance tasks and the facility routinely lost power, suffered local network outages, etc. So we decided that we needed new hardware and more of it spread across the region to avoid the shaky single-point-of-failure.

That began a three year process of: waiting for budget to be available for the hardware / license / support purchases; pitching PowerPoints to senior management to argue for that budget (and getting updated quotes every time from the vendors); working out agreements with other teams at new facilities to rack the hardware; traveling to those sites to install stuff; and working through the cybersecurity compliance stuff for each site. I left before everything was finished, so I don't know how they ultimately dealt with needing, say, someone to physically reseat a cable in Japan (an international flight away).


There is. Middle ground between the extremes of those pendulums of all cloud or physical metal.

You can start with using a cloud only for VMs and only run services on it using IaaS or PaaS. Very serviceable.


You can get pretty far without any of that fancy stuff. You can get plenty done by using parallel-ssh and then focusing on the actual thing you develop instead of endless tooling and docker and terraform and kubernetes and salt and puppet and ansible. Sure, if you know why you need them and know what value you get from them OK. But many people just do it because it's the thing to do...


Do you need those tools? It seems that for fundamental web hosting, you need your application server, nginx or similar, postgres or similar, and a CLI. (And an interpreter etc if your application is in an interpreted lang)


I suppose that depends on your RTO. With cloud providers, even on a bare VM, you can to some extent get away with having no IaC, since your data (and therefore config) is almost certainly on networked storage which is redundant by design. If an EC2 fails, or even if one of the drives in your EBS drive fails, it'll probably come back up as it was.

If it's your own hardware, if you don't have IaC of some kind – even something as crude as a shell script – then a failure may well mean you need to manually set everything up again.


All EBS volumes except io2 have advertised durability of 99.8%, which is pretty low, so don't count it in the magic networked storage category.


Get two servers (or three, etc)?


Well, sure – I was trying to do a comparison in favor of cloud, because the fact that EBS Volumes can magically detach and attach is admittedly a neat trick. You can of course accomplish the same (to a certain scale) with distributed storage systems like Ceph, Longhorn, etc. but then you have to have multiple servers, and if you have multiple servers, you probably also have your application load balanced with failover.


For fundamentals, that list is missing:

- Some sort of firewall or network access control. Being able to say "allow http/s from the world (optionally minus some abuser IPs that cause problems), and allow SSH from developers (by IP, key, or both)" at a separate layer from nginx is prudent. Can be ip/tables config on servers or a separate firewall appliance.

- Some mechanism of managing storage persistence for the database, e.g. backups, RAID, data files stored on fast network-attached storage, db-level replication. Not losing all user data if you lose the DB server is table stakes.

- Something watching external logging or telemetry to let administrators know when errors (e.g. server failures, overload events, spikes in 500s returned) occur. This could be as simple as Pingdom or as involved as automated alerting based on load balancer metrics. Relying on users to report downtime events is not a good approach.

- Some sort of CDN, for applications with a frontend component. This isn't required for fundamental web hosting, but for sites with a frontend and even moderate (10s/sec) hit rates, it can become required for cost/performance; CDNs help with egress congestion (and fees, if you're paying for metered bandwidth).

- Some means of replacing infrastructure from nothing. If the server catches fire or the hosting provider nukes it, having a way to get back to where you were is important. Written procedures are fine if you can handle long downtime while replacing things, but even for a handful of application components those procedures get pretty lengthy, so you start wishing for automation.

- Some mechanism for deploying new code, replacing infrastructure, or migrating data. Again, written procedures are OK, but start to become unwieldy very early on ('stop app, stop postgres, upgrade the postgres version, start postgres, then apply application migrations to ensure compatibility with new version of postgres, then start app--oops, forgot to take a postgres backup/forgot that upgrading postgres would break the replication stream, gotta write that down for net time...').

...and that's just for a very, very basic web hosting application--one that doesn't need caches, blob stores, the ability to quickly scale out application server or database capacity.

Each of those things can be accomplished the traditional way--and you're right, that sometimes that way is easier for a given item in the list (especially if your maintainers have expertise in that item)! But in aggregate, having a cloud provider handle each of those concerns tends to be easier overall and not require nearly as much in-house expertise.


I have never ever worked somewhere with one of these "cloud-like but custom on our own infrastructure" setups that didn't leak infrastructure concerns through the abstraction, to a significantly larger degree than AWS.

I believe it can work, so maybe there are really successful implementations of this out there, I just haven't seen it myself yet!


You are focusing on technology. And sure of course you can get most of the benefits of AWS a lot cheaper when self-hosting.

But when you start factoring internal processes and incompetent IT departments, suddenly that's not actually a viable option in many real-world scenarios.


Exactly. With the cloud you can suddenly do all the things your tyrannical Windows IT admin has been saying are impossible for the last 30 years.


It is similar to cooking at home vs ordering cooked food everyday. If some guarantees the taste & quality people would happy to outsource it.


All of that is... completely unrelated to the GP's post.

Did you reply to the right comment? Do you think "politics" is something you solve with Ansible?


> Cloud expands the capabilities of what one team can manage by themselves, enabling them to avoid a huge amount of internal politics.

It's related to the first part. Re: the second, IME if you let dev teams run wild with "managing their own infra," the org as a whole eventually pays for that when the dozen bespoke stacks all hit various bottlenecks, and no one actually understands how they work, or how to troubleshoot them.

I keep being told that "reducing friction" and "increasing velocity" are good things; I vehemently disagree. It might be good for short-term profits, but it is poison for long-term success.


> I keep being told that "reducing friction" and "increasing velocity" are good things

As always, good rules are good, and bad rules are bad.

Like most people on the internet, you are assuming only one of those sets exist. But you are just assuming a different set from everybody that you are criticizing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: