We made a python module to run complicated non-python code in python (like stuff that needs ridiculous environmental gymnastics). It's called sidomo (Simple Docker Module) and it's used to easily make python modules that are actually docker containers (well, docker images). These containers take an input, then hit it with the contained code, and send the output back to pure python.
The hello world:
from sidomo import Container
with Container('ubuntu') as c:
for line in c.run('echo hello from the;echo other side;'):
print(line)
Going one step further, using http://CommonWL.org you can wrap Dockerized command line tools into callable Python functions, and abstracting all the details of stdin/stdout redirection and getting files into and out of the container.
I've been looking for something like this. My use case is that I have a bunch of really old bioinformatics programs that are a pain to install and I want to run them from a web app. Instead of bundling all of the weird dependencies with the web app, I want to run them in containers using background workers (rails/sidekiq in this case).
This is an awesome use case. Back when I was in particle physics we had to use ROOT (https://root.cern.ch/) for everything, and configuring it in a new environment would take at least a day.
What kind of bioinformatics software are you plugging into a webapp?
That's a neat way way to tight together both worlds, and I can see it being useful in cases like testing.
Nonetheless, it is important to distinguish the need to communicate between programs and the need to programmatically run a piece of software like ffmpeg and getting its output.
For the seconds case, especially in more complex architectures, where you need "interact with software written in another language" it makes sense to explicitly separate this interaction, for example through a broker [0]. In the end, all you need is a way to communicate from Program A that Program B can do some sort of job, and this can be a simple string pointing to a raw video file in a storage like S3, not necessarily the raw file.
I'm struggling to see the advantage of this? Surely either running the entire thing in the container and just running the command using subprocess would achieve the same effect....
That's totally true, but it means the python app can only run in the same container as the process.
I thought there were essentially 2 really cool features of this: you don't have to clean up after your (sub)processes and your docker daemon could be running remotely. (e.g., you could distribute tasks to a bunch of servers running your containers)
you can! I found it difficult to work with and decided to make something new after reading this post: http://blog.bordage.pro/avoid-docker-py/
It's a few years old but I think docker-py still needs some love for it to really shine.
Absolutely, we use it especially because our servers are linux and our personal machines are mac. Most of our app is not containerized but sidomo helps us make the 'fiddly bits' (e.g., ffmpeg) super portable.
Hmm, this could be a good way to use postgresql during unit tests for python applications, as a cheap alternative to sqlite://:memory: and ramdisks. Cleanup is just container management task stuff, instead of adding postgresql package to linux distro.
I feel you on the cleanup side--there was another interesting docker app on HN today that you may want to look at if you're running a large DB in your container: https://github.com/muthu-r/horcrux.
Our Container class is built so that if you use the `with` statement, container termination is handled automatically even if there's a program fault.
Your Python app doesn't have to run in a container at all. If you chose to run it in a container, and you wanted to use sidomo, you would have two options:
1 The container would need to be privileged so that it could run a docker daemon and containers (sidomo processes) within itself.
2 (the "right" way) Use the host's docker daemon from within the first container by binding the docker.sock to the child container. The first container can start and stop others that run next to it, instead of inside it. This way there's no recursion, and no containers need root privileges.
By looking at the code I can't seem to find a way of running the containers with options such as --net, --dns etc. Am I missing something or it's just not part of the plan?
It probably wouldn't be much effort to add an options object that is a direct pass-thru to the Docker API. That gives you the availability of all options and helps to future proof it.
I'm not a Python developer, but I've had success doing exactly this in node with the `dockerode` module.
The hello world: