The benchmarks are crap. The loop/function call overhead time dominates. They should list the benchmark for when the implementation is "return x;" That gives a baseline number. On my machine I have to run a few thousand runs before I can distinguish between "return x;" and "return ((x != 0) && !(x & (x - 1)));"
They both take about 10s for 232 iterations.