It's more that you have to allocate the compute the right way.
Noam Brown's analogy is, you could train a massive one shot foundation model to predict the next best Go move, but that would be stupid. Better to use some test time search. You get better results for less money.
Noam Brown's analogy is, you could train a massive one shot foundation model to predict the next best Go move, but that would be stupid. Better to use some test time search. You get better results for less money.
Same is happening in LLMs.