Data-focused Docker clustering – ClusterHQ

lewq · on Aug 13, 2014

Hey, it's Luke here. I'm one of the founders of ClusterHQ and the author of this post. Happy to answer any questions or discuss any comments here. :)

otterley · on Aug 13, 2014

Some services, particularly those that have to perform complex operations (usually SQL services), have greater resource requirements than others. So the "homogeneous servers" requirement seems practically implausible to me, at least from a cost-optimization standpoint.

It doesn't seem reasonable to me that we should allocate Big Hardware to, say, an nginx HTTP proxy, just to make it possible (no matter how unlikely) to also host a MySQL container on it at some point in the future.

Or am I misunderstanding you?

lewq · on Aug 13, 2014

Hey otterley, thanks for your comment!

The homogeneous server recommendation is based on experience running tens of thousands of containers (mixed application and database containers) in production. In that case, when you have lots more containers than you have servers, it makes sense to allow a system which can automatically distribute the load across those machines to do so, and getting the scheduling right for that automatic load distribution is easier with relatively (although not necessarily exactly) homogeneous machines.

Of course, every deployment is different, and I'm not categorically saying "never have dedicated database hardware", just "clustering with automatic load balancing is easier with homogeneous hardware". :)

ademarre · on Aug 14, 2014

Perhaps more so than database servers, homogeneity becomes impractical when you have containers that demand access to a GPU.

otterley · on Aug 13, 2014

Of course it's easier, but your CFO might tell you you can't do that. :)

wmf · on Aug 13, 2014

The sweet spot for a server is pretty large: 16-20 cores and 256-384 GB RAM. This could run one MySQL or a lot of Nginxen. I/O is a little harder since you might not want to pay for a million IOPS in every server.

otterley · on Aug 13, 2014

There seems to be a false assumption here that OSes and hardware scale linearly, such that 16 nginx containers would run as efficiently on one physical server as they would run on 16 separate smaller servers.

Or that 10Gb Ethernet, which you'd want if you were making such a large number of services co-resident, is common among cloud providers. Open VSwitch can barely handle 1GbE with a high TCP session setup/teardown rate as it is.

otterley · on Aug 13, 2014

Also, the elephant in the room that the linked discussion has also avoided (just like every other container-based platform discussion, sigh) is service performance. How do you propose to optimize I/O latency in particular?

lewq · on Aug 13, 2014

In HybridCluster we did I/O based load balancing. IOW, the primary metric used to decide when to juggle a container onto a less loaded machine was the aggregate disk busy % on the ZFS pool. This kept disk-bound workloads nice and fast in general - if a database starts hammering the disk, the system will automatically "make room" for it, by moving some other busy containers on that machine off to quieter machines.

We'd likely do something similar - eventually - with Flocker. :)

kevinastone · on Aug 13, 2014

Your webpage doesn't render properly in Safari: https://www.dropbox.com/s/753xgwjhvbb1ml7/Screenshot%202014-...

ferrantim · on Aug 14, 2014

Thanks for letting us know. We'll get that fixed.

ademarre · on Aug 14, 2014

Flocker, Fleet, Flynn — The Docker ecosystem is getting flippin' complicated.

shykes · on Aug 14, 2014

Don't forget Centurion, Helios, Atomic/Geard, Mesos/Marathon, Clocker, Kubernetes, Consul, Deis, Shipyard, Shipper... :)

ansible · on Aug 14, 2014

Yikes.

Anyway, with all the interesting container and deployment technology that's come out in the last few years, I get the feeling we're at the verge of something wonderful.

All the system configuration and applications under version control... Slinging around huge amounts of data using btrfs snapshots... SDN... It seems only a matter of time before entire deployments are completely reproducible. From top to bottom.