Yeah, +1. Looking back to the WebVoyager [1] and GPT4V generalist agent [2] papers from last January, it feels like we haven't come that far.
But there are now several major technical unlocks - fine tuning for cursor locations (in Claude), better reasoning with o3, and RL fine-tuning so we can learn based on task success.
But there are now several major technical unlocks - fine tuning for cursor locations (in Claude), better reasoning with o3, and RL fine-tuning so we can learn based on task success.
That gives me significant hope.
[1] https://arxiv.org/abs/2401.13919
[2] https://arxiv.org/abs/2401.01614