https://www.pagetable.com/?p=410 may go some way toward explaining this. The behavior is slightly different than in the video. I believe this is because the video is using a 65C02 instead of the original NMOS 6502 and the implementation is slightly different.
Brilliant; makes perfect sense of the "magic." It'd be interesting to see a compare-and-contrast of the "dump" of a decode ROM of a 65C02 vs the 6502's decode PLA, to see exactly what the 65C02 is doing with those few added cycles in what is presumably its generic BRK implementation.