Bestcase seems like a poor metric when the CPU scheduler could certainly cause 7...

Joky · on Nov 29, 2016

Because of noise in general, "best case" seems always like the best metric to me. Over a large number of run, you're likely to hit the "perfect" measurement with on a microbenchmark.

Otherwise, for an "adaptive" number of runs till enough time is spent to have some "confidence" on the measure, I've been fairly happy with: https://github.com/google/benchmark/

andrepd · on Nov 29, 2016

Just show more statistics: mean, variance, min, max, at least.

kayamon · on Nov 29, 2016

The timing _should_ be constant each run, so best case is the best way to remove the scheduler variations. I tried mean also, and the results aren't that different.

Optimization is -O3 (see the attached Makefile at the bottom).

flukus · on Nov 29, 2016

> I also wish I knew what optimization settings GCC/etc was using, and what effect tweaking those has.

From the makefile:

GCCFLAGS = -O3 --std=c++11

MSFLAGS = /nologo /Ox /Ob2 /Ot /Oi /GL

Too · on Nov 29, 2016

Would march=native and fstrict-aliasing do any difference?

It would be interesting to compare the compiled asm with the hand rolled one.

The code has some potential improvements also but maybe the compiler is smart enough to find them, such as reading pivot.key in the loop even though it doesn't change.

haldean · on Nov 29, 2016

-march=native would almost certainly help, but I'm pretty sure -fstrict-aliasing is the default.