I think Rich Sutton's bitter lesson will prove to apply here, and what we really...

m101 · 2024-09-13T21:50:18 1726264218

It works like our own minds in that we also think, test, go back, try again. This doesn't seem like a failing but just a recognition that thought can proceed in that way.

HarHarVeryFunny · 2024-09-13T22:50:05 1726267805

The "failing" here isn't the short term functional gains, but rather the choice of architectural direction. Trying to add reasoning as an ad-hoc wrapper around the base model, based on some fixed reasoning heuristics (built in biases) is really a dead-end approach. It would be better to invest in a more powerful architecture capable of learning at runtime to reason for itself.

Bespoke hand-crafted models/agents can never compete with ones that can just be scaled and learn for themselves.

joelburget · 2024-09-13T14:45:02 1726238702

o1 is an application of the Bitter Less. To quote Sutton: "The two methods that seem to scale arbitrarily in this way are search and learning." (emphasis mine -- in the original Sutton also emphasized learning).

OpenAI and others have previously pushed the learning side, while neglecting search. Now that gains from adding compute at training time have started to level off, they're adding compute at inference time.

HarHarVeryFunny · 2024-09-13T15:15:01 1726240501

I think the key part of the bitter lesson is that (scalable) ability to learn from data should be favored over built-in biases.

There are at least three major built-in biases in GPT-O1:

- specific reasoning heuristics hard coded in the RL decision making

- the architectural split between pre-trained LLM and what appears to be a symbolic agent calling it

- the reliance on one-time SGD driven learning (common to all these pre-trained transformers)

IMO search (reasoning) should be an emergent behavior of a predictive architecture capable of continual learning - chained what-if prediction.