My understanding is that OpenAI did indeed find diminished capability across a r...

My understanding is that OpenAI did indeed find diminished capability across a range of tasks after doing RLHF. You're correct to question this though - as I believe the opposite was true of GPT-3 where it improved certain tasks.

The benefits from a business perspective were still clear however, and of course the instruction-tuned GPT-4 model still outperformed GPT-3, in general.

There are probably some weird edge cases and nuances that I'm missing - and I'd be happy to be corrected.