My hunch is this comes down to personal prompting style. It's likely that your own style works more effectively with GPT-4o, while other people have styles that are more effective with Claude 3.5 Sonnet.
I would add that the task is relevant too. I feel there’s not yet a model that is consistently better at everything. I still revert to plain old GPT-4 for direct translation of text into English that requires creative editing to fit a specific style. Of all the Claudes and GPTs, it’s the one that gives me the best output (to my taste). On the other hand, for categorisation tasks, depending on the subject and the desired output, GPT-4o and Claude 3.5 might perform better than the other interchangeably. The same applies to coding tasks. With complex prompts, however, it does seem that Claude 3.5 is better at getting more details right.