What we're doing is this: when doing upgrades that actually changes the data model, they go in two phases:
* First, an upgrade that understands the old model and the new model, internally uses the new model, and writes in the old model. This means that this new version is 100% compatible with the old version. We launch new services, test them, add them to the load balancer, and remove the old services from the loadbalancer.
* Secondly, a new update is launched: this one is almost the same as the previous version, except that it writes its data in the new model too. The same process with launching new services and adding to the load balancer is repeated.
Using this two-phase upgrade has the major advantage that you're always running the new services next to an old version that is completely compatible, data-model-wise, and thus allows you to do an emergency rollback to a previous version if required. The trick with adding to the load balancer also ensures that no downtime is experienced for the clients.
All this requires quite a bit of work (especially since you need to deploy multiple releases), so it depends on how much zero-downtime upgrades are worth to you.
* First, an upgrade that understands the old model and the new model, internally uses the new model, and writes in the old model. This means that this new version is 100% compatible with the old version. We launch new services, test them, add them to the load balancer, and remove the old services from the loadbalancer.
* Secondly, a new update is launched: this one is almost the same as the previous version, except that it writes its data in the new model too. The same process with launching new services and adding to the load balancer is repeated.
Using this two-phase upgrade has the major advantage that you're always running the new services next to an old version that is completely compatible, data-model-wise, and thus allows you to do an emergency rollback to a previous version if required. The trick with adding to the load balancer also ensures that no downtime is experienced for the clients.
All this requires quite a bit of work (especially since you need to deploy multiple releases), so it depends on how much zero-downtime upgrades are worth to you.