Hacker News new | past | comments | ask | show | jobs | submit login

I think Rich Sutton's bitter lesson will prove to apply here, and what we really need to advance machine learning capabilities are more general and powerful models capable of learning for themselves - better able to extract and use knowledge from the firehose of data available from the real world (ultimately via some form of closed-loop deployment where they can act and incrementally learn from their own actions).

What OpenAI have delivered here is basically a hack - a neuro-symbolic agent that has a bunch of hard-coded "reasoning" biases built in (via RL). It's a band-aid approach to try to provide some of what's missing from the underlying model which was never designed for what it's now being asked to do.




It works like our own minds in that we also think, test, go back, try again. This doesn't seem like a failing but just a recognition that thought can proceed in that way.


The "failing" here isn't the short term functional gains, but rather the choice of architectural direction. Trying to add reasoning as an ad-hoc wrapper around the base model, based on some fixed reasoning heuristics (built in biases) is really a dead-end approach. It would be better to invest in a more powerful architecture capable of learning at runtime to reason for itself.

Bespoke hand-crafted models/agents can never compete with ones that can just be scaled and learn for themselves.


o1 is an application of the Bitter Less. To quote Sutton: "The two methods that seem to scale arbitrarily in this way are search and learning." (emphasis mine -- in the original Sutton also emphasized learning).

OpenAI and others have previously pushed the learning side, while neglecting search. Now that gains from adding compute at training time have started to level off, they're adding compute at inference time.


I think the key part of the bitter lesson is that (scalable) ability to learn from data should be favored over built-in biases.

There are at least three major built-in biases in GPT-O1:

- specific reasoning heuristics hard coded in the RL decision making

- the architectural split between pre-trained LLM and what appears to be a symbolic agent calling it

- the reliance on one-time SGD driven learning (common to all these pre-trained transformers)

IMO search (reasoning) should be an emergent behavior of a predictive architecture capable of continual learning - chained what-if prediction.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: