GCC 4.5, using popcnt on my Core-i7 860 takes the trivial loop mentioned using "Complement and Compare" from ~10.5s to ~7.5s
GCC 4.5, using popcnt on my Core-i7 860 takes the trivial loop mentioned using "Complement and Compare" from ~10.5s to ~7.5s