Hacker News new | past | comments | ask | show | jobs | submit login

All of these problems are about dependencies.

And dependencies are about the way that we went from a blank slate to a working system.

If you can't retrace that path, you can't debug. If you don't have tools to track the path, you will make mistakes. At best, you can exactly replicate the system that you need to fix -- and fixing is changing.




If I understand correctly, Dockerfile, and image layers, encode that path, making it retrace-able, yes?


Not in general, no. Docker image layers are just snapshots of the filesystem changes that result from build commands. But there is nothing that guarantees that the effects of those commands are reproducible.

For example, it's incredibly common for Dockerfiles to contain commands like:

    RUN apt-get update && apt-get install -y foobar
which means when you build the container, you get whatever version of foobar happens to currently be in the repository that your base image points to. So you can easily end up in a situation where rebuilding an image gives you different behavior, even though nothing in your Dockerfile or build context changed.


Even the Docker support team is confused by this. Most of why I stopped engaging is that they would reject a feature because it lead to dockerfiles they said were 'not reproducible', but were just as reproducible as many of the core features.

Which is to say, not in the slightest.

Every time you do something in a Dockerfile that involves pulling something from the network, you don't have reproducibility. Unless you pull something using a cryptographic hash (which I've never actually seen anyone do), the network can give you a different answer every time. The answer might not even be idempotent (calling it once changes the result of calling it again).

My argument was the same as yours. Virtually every dockerfile has an 'update' call as the second or third layer, which means none of the repeatable layers afterward (like useradd or mkdir) are actually repeatable. You're building a castle on the sand.

Docker images are reproducible. Docker builds are not. And that's okay, as long as everyone accepts that to be true. And if Docker contributors can't accept that then we are all truly fucked, until something else comes along that retreads the ideas.


Sounds like Nix is what you're looking for.


Which can conveniently produce Docker images.


Is there some documentation on exactly how to do this? I've been curious about this in the past, but my search-fu is failing me...


There's a whole chapter on images (VM, docker, appimage) in the manual. I can recommend starting there. https://nixos.org/manual/nixpkgs/stable/#chap-images

The smallest example is:

    pkgs.dockerTools.buildLayeredImage {
      name = "hello";
      contents = [ pkgs.hello ];
    }


Is there really a significant number of folks who are confused by the idea that networks can serve different files under the same name at different times?


> which means when you build the container, you get whatever version of foobar happens to currently be in the repository that your base image points to.

Well no, the layer won't be rebuilt if you already have it from yesterday or last week and that + previous lines in the Dockerfile didn't change. So yours would still be using the old version, but someone new to the project would get the most up-to-date version. Have to remember to skip the cache to get the most recent version.


>Well no, the layer won't be rebuilt if you already have it from yesterday or last week and that + previous lines in the Dockerfile didn't change.

When people say it needs to be reproducible, they don't mean "reproducible in the same host machine, with the same cache lying around".


It's why I like conda's yml files, every dependency listed with its version number and from which conda-channel it came from


you can omit version numbers in Conda files as well tho


yes true! But by default, `conda env export > env.yml` will include the version numbers of the current environment, which is how I usually use it.


Yes but two things about this:

1) you can apt-get a specific version

2) even if you don't apt-get a specific version, if two developers build and then run the container (within some time frame), they will get the same version. It would be rare to get a different one on the same day.

And 3) You can still apt-get a specific version

And 4) That's still far and a way greater chance two developers will have the same version when using docker in this way than if they're both using two crazy different versions of Ubuntu locally on their desktop.

Also most developers that are working on the same codebase will push to their branch and circleCI will make it so there's only one docker image, that's tested and moves along to prod eventually.


You can apt-get a specific version, but that is not a 100% guarantee that the contents of the package will be identical to what was there yesterday. It is absolutely possible to delete a package from repo and replace it with another one with same version.


Distros don't do this and have communally enforced conventions for package revisions that don't involve updates/changes to the upstream source.

Proprietary software vendors and in-house corporate repos might pull sloppy crap here, though.

It is nice when your package manager reliably alerts you to changes like that, though! Binary package managers can't, really— although this feature could be added— but sourced-based ones often do. Nix and Guix enforce this categorically, including for upstream source tarballs for source builds. Even Homebrew includes hashes for its binaries, and provides options to enforce them (refusing to install packages whose recipes don't include hashes).


> even if you don't apt-get a specific version, if two developers build and then run the container (within some time frame)

All of these "maybe it'll work" type ideas are what make modern software so brittle.


If I tell you that there's a remote code execution in libfoobar-1.03 through 1.15, how long does it take you to verify where libfoobar is installed, and what versions are in use? Remember, nobody ships an image layer of libfoobar, it's a common component, though not as common as openssl.

Is there one command, or one script, which can do that? You need that, basically daily.

Is there one command to rebuild with the new libfoobar-1.17, and at most one more to test for basic functionality? You need that, too.


I mean you're not gonna like the answer but in real life when herding cats the answer is setting up an image scanner and renovate, and calling it a day.

It's not like OS images are any better in this respect. I have bit my teeth long enough on software that depends on the base OS and not being able to infra upgrades. Bifurcating responsibility into app/platform is a breath of fresh air by comparison.


Yup this is a hard problem. Shameless plug to my blogpost on how we built something to do this: https://eng.lyft.com/vulnerability-management-at-lyft-enforc...


....I can tell you which ones are used by the current linker on the system in what order.

ldconfig -p | grep libfoobar

If you go and spread the linkers all over hither and yon and make it nigh impossible to get results out of them, or don't bother to keep track of where you put things... Welp. Can't help ya there. Shoulda been tracking that all along.

Oh, excuse me... That'll only work for dynamically built things in the abscence of statically linked stuff or clever LD_PRELOAD shenanigans.

Hope ya don't do that.

Fact is. You're really never going to get away from keeping track of all the moving pieces. You're just shuffling complexity and layers of abstraction around.


Same problem with flatpak. How many versions of openssl are on my laptop? I have no idea!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: