I've just had a thought. When GitHub do update the hashing for better compression, everyone relying on the tar hash will update their hashes. This is the ultimate opportunity to change the tar contents, effect the supply chain, introduce vulnerabilities, and have everyone trust you. Something like Nix which computes the NAR Hash (the result of the tar contents) will not be effected by this, since it only cares about the content. I think this is much better than worrying about an unlikely tar vulnerability. In a system that only trusts the tar hashes, the original source is not able to take advantage of better compression over time, without massive risk of supply chain attack. If you think you can hand me a tarball that can run arbitrary code, for any version of tar that has ever existed, please give it to me so I can experiment with exploits, and I'll buy you a drink of your choice at FOSDEM if you're there!
You're not wrong, but you're also not being realistic.
Nix is not the only system that takes this approach. The Go modules "directory hash" is roughly equivalent, although we defined it in terms of somewhat more standard tooling: it is the output of
sha256sum $(find . -type f | sort) | sha256sum
I am not here advocating that everyone switch to this basic directory hash either, because it's not a solution to the more general problem that many systems are solving, namely validating _any_ downloaded file, not just file archives.
There are widespread, standard tools to run a SHA256 over a downloaded file, and those tools work on _any_ downloaded file. Essentially every programming language ships with or has easily accessible libraries to do the same. In contrast, there are not widespread, standard tools or libraries for the "NAR Hash" nor the Go "directory hash". Even if there were, such tools would need to be able to parse every kind of file that people might be downloading as part of a build, not just tar files.
It's a good solution in limited cases such as Nix and Go modules, but it's not the right end-to-end solution for all cases.
When you say it is not the right end-to-end solution for all cases, I am wondering what case you have in mind that a NAR Hash would not be suitable for.
If you adopt Nix fully, the .narinfo file that cache.nixos.org (a Nix substituted) serves that is signed, contains both the NAR Hash and the hash of the NAR Archive File as well. Additionally, NAR packs and unpacks deterministically, and you can read the implementation in the Nix thesis.
My point is about (1) the broader ecosystem of tools that may need to interoperate and have easy access to "SHA256 the whole file" and (2) the fact that not everything is a tar file that the Nix tools can process. So yes, that's the "only" case.
So what about the IPFS CAR format? https://car.ipfs.io/, it would fulfill a lot of what I expect from NAR too. NAR or CAR, I don't care, I believe the content is what matters, not the container format.
If I have a box with an apple in it, I don't care about the box, I care about the apple inside. If it's not an apple, I don't want to eat it.