>We found Superficial Self-Reflection (SSR) from base models’ responses, in whic...

Jean-Papoulos 3 months ago | parent | context | favorite | on: There may not be aha moment in R1-Zero-like traini...

>We found Superficial Self-Reflection (SSR) from base models’ responses, in which case self-reflections do not necessarily lead to correct final answers.

I must be missing something here. No one was arguing that the AI answers are correct to begin with, just that self-reflection leads to more correct answers when compared to not using the process ?