Git uses SHA-1 hashes, which have not been considered cryptographically secure s...

KMag · on April 22, 2014

SHA-1 has been shown to not be collision-resistant. Correct me if you've heard otherwise, but I believe SHA-1 is still believed to be second-preimage resistant.

In other words, with fewer than 2 * * 80 trials, an attacker can generate two files that have the same SHA-1 hash. (In the case of C source files, this probably means embedding a nonsense comment in the middle of each file, probably several hundred to several thousand ASCII characters.) If an attacker can get the very carefully constructed benign file past code review and have the ability to modify the repository, they can substitute the very carefully crafted malicious file for the very carefully crafted benign file without changing the root of the Merkle tree.

Assuming that SHA-1 is still second-preimage resistant, it will take an attacker about 2 * * 159 attempts to come up with a file that has the same hash as a legitimate file not carefully constructed by the attacker.

So, the weaknesses in SHA-1 probably mean that exploiting those weaknesses in the context of git still requires a mole in the development team. Though, I wouldn't want to bet my life on nobody noticing a big nonsense comment in the middle of a C file or someone figuring out how to construct a reasonably reviewable C file as the carefully crafted benign file.

In any case, the weaknesses in SHA-1 still likely pose a significant difficulty in forging a git history without planting a mole in the dev team. It's much better than no cryptographic barriers to forgery.

XorNot · on April 22, 2014

Conversely there are other signing techniques. GPG signed tags is an officially supported method.

Probably more importantly though, Git encourages everyone to have the full repository lying around. Even if you inserted a vulnerability in a master, there would still be thousands of copies of code which could be independently compared to find the exact changes which were made.

wolf550e · on April 22, 2014

I think gpg signs just the sha1 the tag points to (root of merkle tree). Also, when comparing local repo against remote repo during fetch, I think git assumes that as long as the sha1 of a commit did not change, there is no need to compare further. So the substitution will not get propagated to people who do "git pull" but people who do "git clone" will get it.

floatboth · on April 22, 2014

Linus Torvalds: "the point is the SHA-1, as far as Git is concerned, isn't even a security feature. It's purely a consistency check. The security parts are elsewhere, so a lot of people assume that since Git uses SHA-1 and SHA-1 is used for cryptographically secure stuff, they think that, OK, it's a huge security feature. It has nothing at all to do with security, it's just the best hash you can get."

gnoway · on April 22, 2014

The edit button is gone. I guess these expire?

My reply was not intended as an attack on Git. I use it daily and would choose it 10 times out of 10 vs. CVS for a new project. I just think the assertion that Git 'saved' Linux from some backdooring attempts because it's decentralized and uses cryptographic hashes is wrong; it's not the tools that make this happen, it's the processes around the use of these tools which do that.

I don't know any OpenBSD developers nor do I have any inside knowledge of how their team works, but I know from observation that they are a small team with high standards for code style and quality. They don't just let anyone commit code and appear to be thorough with code review. When procedural/practice problems are identified in the industry, they are proactive about mitigating or fixing those. They have a demonstrated track record of good releases. Basically, I don't see any reason to question their use of CVS.

TacticalCoder · on April 22, 2014

(first note that I didn't assert anything: I asked question(s) and used "IIRC" etc.)

I found the story back and things are, IMHO, actually quite interesting... If only because the attempt was made after someone ill-intentioned gained access to Linux's CVS repository.

Back then Linux was still using BitKeeper (decentralized) for Linus hadn't created Git yet (so I was not remembering things correctly here). But apparently some people didn't like BitKeeper so there was a CVS clone of the BitKeeper version. And it's in the CVS repo that the attempt took place (after someone hacked his way into the server hosting the CVS repo).

Here's the story:

https://freedom-to-tinker.com/blog/felten/the-linux-backdoor...

Now even though Linus didn't choose SHA-1 for its cryptographic properties and even if SHA-1 is not SHA-256 nor SHA-3, it still looks like an attacker gaining access to a CVS repo would have a much easier time inserting a backdoor than an attacker gaining access to DVCS using cryptographic hashes (which user KMag here explained nicely).

boklm · on April 22, 2014

> Basically, I don't see any reason to question their use of CVS.

Why not ?

With CVS, the security rely on the security of a single server. Anybody with root access to the CVS server can modify history, and nobody would notice.

rpdillon · on April 22, 2014

> Git uses SHA-1 hashes, which have not been considered cryptographically secure since 2005.

That a vast oversimplification. There are still no publicly-known preimage or second-preimage attacks against SHA-1. Even the collision attack in 2005 was limited insofar that it reduced the search from a brute-force 2^80 to 2^69.

Perhaps you're referring to a length extension attack, but I admit I don't know much about those in the context of Git's use of SHA-1.

KMag · on April 22, 2014

In terms of C source files, unless the fist C source file being extended is at least 64 petabytes long, a length extension attack is going to embed null bytes in the C source file. I don't know chapter and verse of any of the C standards or GCC/Clang extensions, but I wouldn't be surprised if even string literals and comments including nulls cause problems for both GCC and Clang.

Anyone care to chime in/experiment with ways to embed nulls in C files such that either GCC or Clang will continue compiling code after hitting a null byte in the middle of a file? (I'm not talking about an escaped null in a character or string literal, but actually a 00 showing up in hexdump -C of the source.)

EDIT: I think it's safe to assume people will notice someone trying to sneak a 64-petabyte C source file into the codebase. With apologies to Sweet Brown, aint nobody got time for [downloading] that.

kshahkshah · on April 22, 2014

"I am completely comfortable with them using whichever tools they want."

Forgive me for saying so and I doubt you intend it so, but statements like this is what gets us into trouble. At a certain point we need to trust the people that build the foundation for us, but only once we've done our due diligence. Most people, maybe not you since you seem to know more about the team, have not yet.