Hacker News new | past | comments | ask | show | jobs | submit login

FWIW, in gcc/g++ there are compiler intrinsics which (should) map to that instruction on CPUs where it's available: __builtin_popcnt, __builtin_popcountl and __builtin_popcountll for unsigned ints, unsigned longs and unsigned long longs respectively. Visual C++ provides equivalent functions for Windows (but I can't remember what they're called).

It does seem odd that the article misses this approach out.




Unfortunately __builtin_popcnt isn't emitting a popcnt instruction with the GCC I've got here, even using -msse4.2. I believe that very recent GCC does get this right.


GCC 4.4 isn't very recent, but it generates a popcnt with -msse4.2.

GCC 4.5, using popcnt on my Core-i7 860 takes the trivial loop mentioned using "Complement and Compare" from ~10.5s to ~7.5s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: