We can do password guessing way better than brute force by generating strings using a Markov model instead of randomly. I wrote a paper on how to do this a few years ago http://www.cs.utexas.edu/~shmat/abstracts.html#pwd Some of the techniques have since found their way into John the Ripper. (and possibly other tools; I haven't checked.) The upshot is that you can't calculate strength with a simple formula like this. At best you get an upper bound on the time needed to crack, by a few orders of magnitude.
I've created wordlists from Wikipedia database dumps some time ago (http://benjamin-schweizer.de/files/wordlist-wikipedia/); they are pretty large and thus, useful for dictionary attacks. The wordlists are sorted, common words are on top of the lists.
I think that there is a typical password length, so you could improve the sorting based upon a multi-dimensional rating scheme. I'd use expected password length and commonness of a word as factors. Mixing these real words with computer generated words might speed up brute force attacks.
However, I'm not sure how to integrate ordered wordfiles with rainbow tables. Any ideas?
I don't think word frequency is a good estimator for the likeliness of passwords. Many frequently used words -- like connectors or adverbs -- are unlikely to be used as passwords. I expect proper names (of people, places, or cultural works) are the most common passwords, which are at a relative disadvantage in word frequency lists.
rainbow tables are an implementation of the time-space tradeoff concept: you are trying to search through a space so large you cannot enumerate it. if you have already enumerated it, as in a wordlist, it is not meaningful to use rainbow tables. it's not a question of how; it's not even a well-defined operation.
that's great that you've made that list though.. i wanted word frequency tables for my startup which is an entirely unrelated type of project. if i hadn't found this i would have compiled it myself; thanks much :)
while your list doesn't have frequencies, i guess i can use the position in the list as a proxy for frequencies. but it's not optimal. any chance you can put up a list which also has the counts?
If you had a good training set of a million or so actually used passwords, you could use some machine learning techniques to make this go faster when coupled with the insights you use from natural language processing. Tragically, I doubt people would contribute their old passwords to a data set just to prove this.
The most important passwords I have are the ones for online banking and the like where you certainly aren't going to be doing bloody 25 billion attempts an hour.
After 3 attempts most high security sites block you out and others introduce a captcha (which though crackable through crackable will introduce a delay). Any site is going to take note (or go down) of 25 billion hits/hr that too on a single account (does total traffic/hour on a site like google.com sum to this? )
The time taken is only to generate the permutations, not to crack a password.
The assumption is that you have access to the hashed version of the password. That's not so hard to get from some sites. I have retrieved user-password pairs using very simple SQL Injection in some e-commerce sites(not any the big ones, of course).
The numbers give make no sense, because doesn't state which hash is using, and the difference may be huge:
~/john-1.7.2/run$ ./john --test
Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts: 906828 c/s real, 908646 c/s virtual
Only one salt: 805504 c/s real, 805504 c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2]... DONE
Many salts: 31271 c/s real, 31334 c/s virtual
Only one salt: 30617 c/s real, 30617 c/s virtual
Benchmarking: NT LM DES [128/128 BS SSE2]... DONE
Raw: 6575K c/s real, 6588K c/s virtual
Which I find unbelieveable is that lots of web applications use simple MD5-passwords(not the FreeBSD MD5 based version but just MD5 hashes) without even using salts, which makes them almost instantly crackable using Rainbow tables.
If you can get access to the hashed password, that's a major breach in of itself.
It's almost comparable in my mind to saying, if you get physical access to the machine you can do exploits X, Y, and Z. Well, yeah, if you can get that far, you've pretty much won.