Some services, particularly those that have to perform complex operations (usually SQL services), have greater resource requirements than others. So the "homogeneous servers" requirement seems practically implausible to me, at least from a cost-optimization standpoint.
It doesn't seem reasonable to me that we should allocate Big Hardware to, say, an nginx HTTP proxy, just to make it possible (no matter how unlikely) to also host a MySQL container on it at some point in the future.
The homogeneous server recommendation is based on experience running tens of thousands of containers (mixed application and database containers) in production. In that case, when you have lots more containers than you have servers, it makes sense to allow a system which can automatically distribute the load across those machines to do so, and getting the scheduling right for that automatic load distribution is easier with relatively (although not necessarily exactly) homogeneous machines.
Of course, every deployment is different, and I'm not categorically saying "never have dedicated database hardware", just "clustering with automatic load balancing is easier with homogeneous hardware". :)
The sweet spot for a server is pretty large: 16-20 cores and 256-384 GB RAM. This could run one MySQL or a lot of Nginxen. I/O is a little harder since you might not want to pay for a million IOPS in every server.
There seems to be a false assumption here that OSes and hardware scale linearly, such that 16 nginx containers would run as efficiently on one physical server as they would run on 16 separate smaller servers.
Or that 10Gb Ethernet, which you'd want if you were making such a large number of services co-resident, is common among cloud providers. Open VSwitch can barely handle 1GbE with a high TCP session setup/teardown rate as it is.
Also, the elephant in the room that the linked discussion has also avoided (just like every other container-based platform discussion, sigh) is service performance. How do you propose to optimize I/O latency in particular?
In HybridCluster we did I/O based load balancing. IOW, the primary metric used to decide when to juggle a container onto a less loaded machine was the aggregate disk busy % on the ZFS pool. This kept disk-bound workloads nice and fast in general - if a database starts hammering the disk, the system will automatically "make room" for it, by moving some other busy containers on that machine off to quieter machines.
We'd likely do something similar - eventually - with Flocker. :)
Anyway, with all the interesting container and deployment technology that's come out in the last few years, I get the feeling we're at the verge of something wonderful.
All the system configuration and applications under version control... Slinging around huge amounts of data using btrfs snapshots... SDN... It seems only a matter of time before entire deployments are completely reproducible. From top to bottom.