The newer models are definitely more useful. Back in the GPT 3.5 and 4 days, AutoGPT applied the same types of tools, but you had to be pretty lucky for it to get anywhere. Now Claude 3.7, Gemini 2.5, GPT o3 make much fewer mistakes, and are better able to get back on-track when a mistake is discovered. So they're more convincing as intelligent helpers.
Good point. I still wonder if o3 has improved command of tools because it's significantly smarter in general. Or it's "just" trained with a specific focus on using tools better, if that makes sense.