If you have 100 services in your org, I don't have to have 100 running at the same time in your local dev machine. I only run the 5 I need for the feature I'm working on.
We have 100 Go services (with redpanda) and a few databases in docker-compose on dev laptops. It works well when and we buy the biggest memory MacBooks available.
Your success with this strategy correlates more strongly with ‘Go’ than ‘100 services’ so it’s more anecdotal than generally-acceptable that you can run 100 services locally without issues. Of course you can.
Buying the biggest MacBook available as a baseline criteria for being able to run a stack locally with Docker Compose does not exactly inspire confidence.
At my last company we switched our dev environment from Docker Compose to Nix on those same MacBooks and CPU usage when from 300% to <10% overnight.
Have any details on how you've implemented Nix? For my personal projects I use nix without docker and the results are great. However I was always fearful that nix alone wouldn't quite scale as well as nix + docker for complicated environments.
Hi Jason! Like many others here I'm looking forward to that blog post! :-)
For now, could you elaborate on what exactly you mean by transitioning from docker-compose to Nix? Did you start using systemd to orchestrate services? Were you still using Docker containers? If so, did you build the images with Nix? Etc.
When we used docker-compose we had a CLI tool which developers put in their PATH which was able to start/stop/restart services using the regular compose commands. This didn’t accomplish much at the time other than being easy to remember and not requiring folks to know where their docker-compose files were located. It also took care of layering in other compose files for overriding variables or service definitions.
Short version of the Nix transition: the CLI tool would instead start services using nix-shell invocations behind pm2. So devs still had a way to start services from anywhere, get logs or process status with a command… but every app was running 100% natively.
At the time I was there, containers weren’t used in production (they were doing “App” deploys still) so there was no Docker target that was necessary/useful outside of the development environment.
Besides the performance benefit, microservices owning their development environment in-repo (instead of in another repo where the compose configs were defined) was a huge win.
several nixy devtools do some process management now
something we're trying in Flox is per-project services run /w process-compose.
they automatically shutdown when all your activated shells exit, and it feels really cool
I've been on this path and as soon as you work on a couple of concurrent branches you end up having 20 containers in your machine and setting these up to run successfully ends up being its own special PITA.
What exactly are the problems created by having a larger number of containers? Since you’re mentioning branches, these presumably don’t have to all run concurrently, i.e, you’re not talking about resource limitations.
Large features can require changing protocols or altering schemas in multiple services. Different workflows can require different services, etc. Keep track of different service versions in a couple branchs (not unusual IMO) and it just becomes messy.
You could still run the proxy they have that lazy boots services - that’s a nice optimisation.
I don’t think that many places are in a position where the machines would struggle. They didn’t mention that in the article as a concern - just that they struggled to keep environments consistent (brew install implies some are running on osx etc).
I think it’s safe to assume that for something with the scale and complexity of Stripe, it would be a tall order to run all the necessary services on your laptop, even stubs of them. They may not even do that on the dev boxes, I’d be a little surprised if they didn’t actually use prod services in some cases, or a canary at any rate, to avoid the hassles of having to maintain on-call for what is essentially a test environment.
I don’t know that’s safe to assume. Maybe it is an issue but it was not one of the issues they talk about in the article and not one of the design goals of the system. They have the proxy / lazy start system exactly so they can limit the services running. That suggests to me that they don’t end up needing them all the time to get things done.