What has changed with CoT and high compute is not yet clear. My point is that if...

What has changed with CoT and high compute is not yet clear. My point is that if it makes bare prompt injection harder for humans then we shouldn't call it a fundamental limitation anymore.

Are LLMs nothing more than auto-regressive stochastic parrots? Perhaps not anymore, depending on test time, native specialty tokens, etc.