FWIW, in gcc/g++ there are compiler intrinsics which (should) map to that instruction on CPUs where it's available: __builtin_popcnt, __builtin_popcountl and __builtin_popcountll for unsigned ints, unsigned longs and unsigned long longs respectively. Visual C++ provides equivalent functions for Windows (but I can't remember what they're called).
It does seem odd that the article misses this approach out.
Unfortunately __builtin_popcnt isn't emitting a popcnt instruction with the GCC I've got here, even using -msse4.2. I believe that very recent GCC does get this right.
It does seem odd that the article misses this approach out.