Hacker News new | past | comments | ask | show | jobs | submit login

We can do password guessing way better than brute force by generating strings using a Markov model instead of randomly. I wrote a paper on how to do this a few years ago http://www.cs.utexas.edu/~shmat/abstracts.html#pwd Some of the techniques have since found their way into John the Ripper. (and possibly other tools; I haven't checked.) The upshot is that you can't calculate strength with a simple formula like this. At best you get an upper bound on the time needed to crack, by a few orders of magnitude.



I've created wordlists from Wikipedia database dumps some time ago (http://benjamin-schweizer.de/files/wordlist-wikipedia/); they are pretty large and thus, useful for dictionary attacks. The wordlists are sorted, common words are on top of the lists.

I think that there is a typical password length, so you could improve the sorting based upon a multi-dimensional rating scheme. I'd use expected password length and commonness of a word as factors. Mixing these real words with computer generated words might speed up brute force attacks.

However, I'm not sure how to integrate ordered wordfiles with rainbow tables. Any ideas?


I don't think word frequency is a good estimator for the likeliness of passwords. Many frequently used words -- like connectors or adverbs -- are unlikely to be used as passwords. I expect proper names (of people, places, or cultural works) are the most common passwords, which are at a relative disadvantage in word frequency lists.


rainbow tables are an implementation of the time-space tradeoff concept: you are trying to search through a space so large you cannot enumerate it. if you have already enumerated it, as in a wordlist, it is not meaningful to use rainbow tables. it's not a question of how; it's not even a well-defined operation.

that's great that you've made that list though.. i wanted word frequency tables for my startup which is an entirely unrelated type of project. if i hadn't found this i would have compiled it myself; thanks much :)

while your list doesn't have frequencies, i guess i can use the position in the list as a proxy for frequencies. but it's not optimal. any chance you can put up a list which also has the counts?


I don't have a current dump, but if you send me an email, I can give you the script I've written to create those dumps. It prints out the counts.


If you had a good training set of a million or so actually used passwords, you could use some machine learning techniques to make this go faster when coupled with the insights you use from natural language processing. Tragically, I doubt people would contribute their old passwords to a data set just to prove this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: