IIRC, according to Andres Freund the perf regression only happened in machines using the -fno-omit-frame-pointer setting, which was not the default at that point.
The -fno-omit-frame-pointer bit is separate from the slowdown. -fno-omit-frame-pointer lead to valgrind warnings, but no perceptible slowdown in most (including this) cases.