Yeah, +1. Looking back to the WebVoyager [1] and GPT4V generalist agent [2] pape...

Yeah, +1. Looking back to the WebVoyager [1] and GPT4V generalist agent [2] papers from last January, it feels like we haven't come that far.

But there are now several major technical unlocks - fine tuning for cursor locations (in Claude), better reasoning with o3, and RL fine-tuning so we can learn based on task success.

That gives me significant hope.

[1] https://arxiv.org/abs/2401.13919

[2] https://arxiv.org/abs/2401.01614