I haven't really noticed that myself. I go "LLM shopping" fairly frequently trying to find which one of the few I'm paying for gives the best result for the current problem. They all seem to have their shortfalls, although I will say Claude is better for Greenfield work.
Well, probably you can try to go down to simpler models to get the idea. (they are almost useless) From my experience better model like Claude or o3 can do things that others simply cannot. At some complexity they start going circles making wrong decisions, forgetting things that are still in context window. But the thing is those complex tasks are usually the most interesting and important.