Here's exactly what I was told -- "This was achieved by manually unrolling a 10-step loop, which compiler apparently could not optimize."