It might not catch the kinds of things that seem strong but end up on word lists. `correctbatteryhorsestaple`, and even more so `correctbatteryhorsestaple1` or `correctbatteryhorsestaple!` would probably pass a "strength" test with flying colors, but you bet it would get cracked in a moment by any script kiddie with a word list.
I remember seeing a list of cracked passwords and one of the ones they got was !QAZ2wsx#EDC4rfv%TGB6yhn. It passes every single password strength checker and dictionary word checker in the world, and still gets cracked.
zxcvbn actually correctly identifies that password as being the result of the user hitting multiple adjacent keys on a standard QWERTY keyboard. Still passes the strength check though because it doesn't identify the more complex pattern of moving top to bottom, then left to right across the keyboard, alternating shift with every column.
This also shows that measurements of entropy are always relative to a particular party's knowledge, which is an interesting concept. (When we say that we're measuring the uncertainty that an attacker -- or message recipient -- has, that naturally depends on what the attacker or message recipient knows.)
Usually in entropy estimates you assume the attacker has full knowledge of the password generation method used. (Kerckhoff's principle.) In reality, most attackers won't know what generation method you used, but it's better not to rely on security by obscurity when it comes to passwords.
zxcvbn accounts for the use of word lists. (And keyboard patterns, and common dates, and repeated characters, and a dozen or so other common patterns you probably haven't thought of yet.) Try it yourself: https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.ht...
And in fact, four random words is actually quite strong. The XKCD comic that password is taken from accounts for the use of word lists in its entropy calculation. In fact, it even _assumes_ the attacker knows the exact 2048-word dictionary you're selecting the words from. Even under those assumptions, four random words is _still_ a pretty strong password.
But a brute force test like the parent comment described wouldn't catch that either, unless it had 'correctbatteryhorsestaple' as a word in one of its dictionaries. And if you're going to go that route, it's just as easy to put 'correctbatteryhorsestaple' in one of zxcvbn's dictionaries.
Any common password pattern you could catch via brute force could also be detected via zxcvbn, except that zxcvbn would be much faster and more efficient at it.
Yes, the info I was missing, which you provided in your first reply, was that zxcvbn does use word lists. I should have acknowledged that in my reply, thank you.