I wonder if this incident will encourage our industry to build more robust forms...

bentley · on Jan 30, 2023

We’ve regressed from the previous norm of open source projects providing stable source tarballs with fixed checksums, sometimes even with cryptographic signatures.

reindeerer · on Jan 31, 2023

That norm still exists, and it's offered by Github in form of Github Releases feature as well.

It's the downstream tooling ( i.e. all the builds and package managers ) that need to clean their act up.

JonChesterfield · on Jan 31, 2023

If the source tar changes, how do you propose the downstream tooling distinguishes between data corruption, MITM attack and upstream deciding to change the number without notifying anyone?

reindeerer · on Feb 6, 2023

That's the whole point, source tars when properly versioned don't change. And you can get unchanged versions from any mirror in the world. sha256 of linux-2.6.10 release is 404e33da7c1bf271e0791cd771d065e19a2b1401ef8ebb481a60ce8ddc73e131, it wont change

rswail · on Jan 31, 2023

This is being driven in industry by the push by US FedGov (via NIST) to have supply chain verification after the recent hacks.

POTUS issued an EO and NIST have been following up, leading to the promotion of schemes such as spdx https://tools.spdx.org/app/about/

Where I work is also required to start documenting our supply chain as part of the (new, replacing PCI-DSS) PCI-SFF certification requirements, which requires end-to-end verification of artifacts that are deployed within PCI scope.

So really, the arguments about CPU time etc are basically silly. The use of SHA hashes for artifacts that don't change will be a requirement for anyone building industrial software, or supplying to government, or in the money transacting business.

metrognome · on Jan 31, 2023

Oh, I'm not arguing that using checksums, SHA for example, for integrity verification is a bad idea. That's what they're designed for, after all.

However, I do think it's a bad idea to enforce the content of compressed archives to be deterministic. tar has never specified an ordering of its contents. Compression algorithms are parameterized for time and space, so their output should not be deterministic either. Both of these principles apply to zip as well. But we now have a situation where we are depending on both the archive format and the compression algorithm to produce a deterministic output. If we expect archives to behave this way in general, we set a bad precedent for all sorts of systems, not just git and GitHub.