Not it's not. You can't distinguish the complete works of Shakespeare from the c...

ganzuul · on Oct 12, 2021

Your examples really fall out of the scope of the premise.

dTal · on Oct 12, 2021

Forgive me if I've misunderstood - the premise was that, out of the infinite set of data that hashes to a particular hash/checksum, there is a unique data set that is "obviously" the real one? Or at least, that the set is meaningfully bounded? My reply is that this is not the case. There will be infinitely many "plausible" data sets as well.

You could collide every hash in existence merely by making undetectably tiny alterations to Shakespeare.

ganzuul · on Oct 13, 2021

Not obviously, no. - Just a very high probability. Perhaps that provides noise immunity to some degree. If it does, it is a form of AI.

This admittedly open question presumes a very large fuzzy 'code book' with which it can re-assemble the data. The length of the input in cleartext is valuable metadata that speeds up the search.

dTal · on Oct 13, 2021

You still seem to be missing the point. The problem is that the SHA-1 hash is one of 2^160 possible values, while a 1GB plaintext is one of 2^8000000000 possible values. That means that the hash disambiguates your message down to one of 2^7999999840 values.

Let's go smaller. Let's say our plaintext is a single kilobyte. One 80x25 terminal's worth of ASCII. Knowing the SHA-1 hash narrows our search space, yes, but the search space is so absurdly large that it only solves 0.0000... (insert over two thousand zeros here) ..0001% of the problem.

It tells you basically nothing.

fwip · on Oct 12, 2021

No, they're 100% correct.