A) How do you catch tarballs that have extra files injected that aren't part of your manifest
B) What does the performance of this look like? Certainly for traditional HDDs this is going to kill performance, but even for SSDs I think verifying a bunch of small files is going to be less efficient than verifying the tarball.
A wouldn't be an issue since you are checking out a git tag.
B would just be a normal git checkout, which already validates that all the objects are reachable and git tags (and commits for that matter) can be signed, and since the sha1 hash is signed as well it validates that the entire tree of commits has not been tampered with. So as long you trust git to not lie about what it is writing to disk, you have a valid checkout of that tag.
And if you do expect it to lie, why do you expect tar to not lie about what it is unpacking?
I know GitHub had asked that clones from package manager use shallow clones. It wouldn't surprise me if downloading tarballs is similarly beneficial to GitHub because it's trivially cacheable in a CDN and thus lowers their operational footprint to support package managers.
Well, the simplest way would be to make checksum after decompression, that doesn't need per file verify and relies on files being put in same order into tar file.
The other method would be having Manifest file with checksum of every file inside the tar and compare that in-flight, could be simple "read from tar, compare to hash, write to disk" (with maybe some tmpfiles for the bigger ones)
It’s not just about the integrity of the files you’re processing, but also the integrity of the archive itself. If you extract the tarball from a random place, there’s a larger security risk. Now granted HTTPS probably mitigates a lot of it, but cert pinning isn’t that common so MITM attacks aren’t thaaat theoretical.
You can do validation in flight during extraction. Signed file manifests are how distros like Debian did it since forever, althought in their cases its two step process, the packages themselves contain their own signature and whole directory tree also gets signed (to avoid shenaningans like "attacker putting older, still vulnerable, but signed version into the repo)