That sounds like a pretty fragile test - could you see it failing if I plug my laptop into AC power while the test is running? (Intel SpeedStep or whatever it's called will take my laptop from 2ghz to 3ghz when I do that).
> could you see it failing if I plug my laptop into AC power while the test is running?
That should at max cause a single test case failure, as it caused a speedup during one leg of the operation benchmark.
Since these are not completely automated (you might want to have them as a set of allowed to fail tests, or tests you have to explicitly request), you can easily just re-run the tests to confirm the prior output wasn't a fluke. You need something like that anyway because you're essentially testing benchmark outputs, and it's not like it's easy to set up perfectly consistent benchmarks in the first place.
The margin of margin you want your PERFHACK to be faster than the default code is also your margin of error. Theoretically you would test something like NORMAL_SECONDS*0.8 > PERFHACK_SECONDS in your test to ensure you're still getting a 20% or more speedup. If some changes cause it to drop to 15%, you then get to assess whether that gain is still worth the hack (and perhaps change the test to be a 10% gain assertion), or whether you want to clean up your code (or at least schedule it).
The point is that you've put a system in place that allows you to have a clearer picture of whether your prior decisions with regard to performance hacks still make sense.