Not so simple. A hash corresponds to infinitely many messages, but how many of t...

hyperpape · on Oct 12, 2021

Ignoring punctuation, capitalization and spaces, you have

(26 ^ 1024) / (2 ^ 160) ascii texts, which is too large for python to do the division.

Thinking harder, that's:

>>> (13 * (160)) * (26 ** (1024 - 160)) 586015382205826960727672544401463871212215837535709025412643135464856100047507808667927549205890681326798481562717865671371914861707033271010401105757243098374422354323280335989456977123883814519788789676409601028611595593846201622073508573113400508407626532918150795408521721900638322896979964584478287370459866751451622413984067802115006844002721198098126119299044037305285971873899685570443731713584345786693414215895940310733413089663439828136700053588516077484775302179979357918243214847406310224703621807862576801557249657421436718532619781248154845297450962161296162654285951879462181582579086975207170674079727507724939279192338763731562331681240184746490930657625088527012496974579965412172847315257089070105214466289197177909463176672954181773739345690793807307314187284671422149249630372138364687234076394633405015375177710973043617082884089261346079718819448519032937091364439564690325014717871376537803759701695837030040255807318892621674241958398030180530294671920834183657715967728968579955163053146309746677182369005157209730333605691785063279407829717681098919238400049142795696195708090541066855775773950462556018168019660329830112434789524857886670766484791538540247307849949305926757172442880517278313785940979709167536140659937329357336156000509320395656047604973761134700992358740522602445520136787846575269569119611476145342798153931437530464566322510167663230985347176517861376

If you restrict it to words, assume that there's only 1000 words in the English language, averaging 5 letters each, you get:

(1000 ^ 200) / (2 ^ 160) which becomes:

>>> (500 ** 160) * (1000 ** 40) 684227765783602085411977335590779360976690401306892466678255997993062052092705371819647552911192178726196289062500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

There will be massive numbers of realistic texts that match any given hash.

akomtu · on Oct 13, 2021

Good point. I've realized that 2^160 can be just 160 words, each word is a choice of two synonyms.

dTal · on Oct 12, 2021

The pigeonhole principle forbids it, since the hash itself is an ASCII message under 1KB long. Even within your message constraints, the message can contain contain any possible hash, and more besides. Therefore there are more valid messages than hashes.

kragen · on Oct 13, 2021

It's very unlikely for a 160-bit hash to have a unique message under the constraint of it being "in ASCII charset under 1KB long". 1 KiB of printable ASCII including \n and space but not \t or \177 is 96¹⁰²⁴, which is about 2⁶⁷⁴³. Of these, if the hash function is any good at all, almost exactly one out of every 2¹⁶⁰ will have the right hash, leaving you with an expected 2⁶⁵⁸³ distinct ASCII-charset messages under 1KB long. If I calculated it correctly, instead of one unique message with the right hash consisting of under 1 KiB of printable ASCII (possibly with some spaces appended to pad it out to 1 KiB), you have about 47 967 084 939 617 240 088 231 752 005 306 325 884 757 564 806 051 499 378 420 486 072 786 141 026 711 644 389 624 345 361 644 245 593 155 700 554 795 067 612 226 300 255 186 367 434 267 198 328 891 304 541 335 537 513 836 547 400 045 409 787 642 617 075 359 474 632 262 281 099 933 405 444 693 275 797 636 981 359 166 142 551 935 433 456 776 135 242 574 665 937 013 670 423 995 769 980 893 248 768 962 950 653 929 460 806 907 564 741 845 059 869 794 472 249 820 375 550 104 552 506 227 318 738 353 617 584 549 423 076 513 640 935 785 892 067 652 844 586 446 661 520 922 590 520 435 212 971 778 420 191 665 135 704 785 297 868 496 132 194 344 092 532 767 753 539 023 500 656 559 945 585 667 295 477 775 543 847 979 594 651 751 709 542 229 125 925 151 633 946 693 073 932 350 683 231 283 773 613 582 005 763 154 838 864 288 879 888 332 953 553 058 126 791 192 014 388 230 316 497 633 811 926 064 371 458 658 795 966 440 990 101 901 032 836 632 802 434 803 173 156 709 301 006 193 682 249 500 596 835 881 601 627 616 029 997 922 141 423 971 635 916 546 397 289 367 953 405 102 342 661 998 428 622 018 161 995 405 622 814 061 925 210 174 108 063 042 331 122 962 491 864 135 145 051 181 277 545 303 147 381 365 523 778 185 116 460 815 262 263 500 402 481 608 602 600 186 833 670 768 942 448 591 646 439 127 301 400 601 336 130 050 539 973 301 764 212 409 825 855 458 107 341 401 691 305 152 002 782 886 099 703 975 355 852 347 383 236 360 931 712 288 039 396 729 525 475 540 725 954 319 379 177 826 924 167 807 020 037 630 409 522 341 596 693 199 836 065 989 968 868 685 185 086 087 022 273 317 809 678 407 361 940 533 280 300 173 079 635 303 309 555 449 410 556 692 415 681 998 421 397 202 738 917 214 037 435 078 427 359 778 923 359 900 840 055 529 921 692 702 321 542 997 864 543 418 915 511 989 871 945 334 709 916 341 623 655 699 199 160 379 312 900 066 054 592 963 465 517 593 966 290 755 564 533 465 052 063 440 394 099 369 011 964 754 135 849 017 025 315 982 367 652 117 408 473 872 192 766 757 909 754 922 584 465 489 973 138 378 312 812 178 366 379 361 777 540 778 799 973 694 388 412 249 147 549 245 795 058 816 022 744 426 339 281 128 551 995 761 983 244 680 295 344 226 171 135 266 747 723 635 275 591 506 459 047 751 124 514 571 326 309 610 529 999 309 895 628 777 141 756 979 076 770 232 849 272 425 879 893 429 411 541 942 495 076 207 596 350 607 674 491 546 842 819 722 368 514 723 670 390 388 258 639 701 760 252 716 026 232 912 354 822 929 844 920 997 401 287 086 389 559 396 695 747 881 419 688 945 588 175 366 178 512 747 036 422 014 697 873 269 878 782 396 373 076 554 780 366 468 559 467 108 175 337 875 528 158 025 875 456 of them. (Sorry for the number format with spaces; it got chopped when I used commas.)

Most of them will be gibberish; English text has only about 2.3 bits of entropy per letter. So there are only on the order of 2²³⁵⁵ 1KiB messages that look pretty much like English text, of which one in 2¹⁶⁰ will have any particular SHA-1 hash, leaving about 2²¹⁹⁵ more or less English messages of 1 KiB, which is a much more manageable 5 765 546 543 805 391 543 300 385 245 067 897 407 469 008 380 207 694 434 170 981 314 369 415 226 086 245 896 005 497 410 349 176 911 651 361 357 544 908 126 864 379 940 776 407 262 468 025 247 520 821 365 392 566 254 691 849 336 550 399 984 742 144 883 696 325 495 839 942 505 506 308 529 294 485 245 435 346 088 288 415 306 782 152 045 986 880 430 505 821 218 111 120 701 594 573 419 855 327 199 586 861 839 630 511 065 600 663 692 968 681 473 384 074 002 850 142 261 291 497 547 545 795 867 600 142 345 188 353 358 006 378 705 229 284 788 565 040 964 509 510 302 568 387 814 225 873 737 552 804 109 763 080 706 434 267 888 314 149 674 523 819 024 312 546 837 031 915 917 556 591 511 424 773 862 591 940 658 144 814 461 877 029 111 670 089 356 835 845 931 924 493 084 507 666 309 424 365 148 038 224 615 440 025 478 945 269 023 101 615 392 514 882 287 817 384 451 162 838 663 168 messages.

It is unlikely that it would be apparent upon inspection which one was correct, since most of them differ from other messages by only a couple dozen subtle changes.