I've had a coding project where I actually preferred 4o outputs to DeepSeek R1, though it was a bit of a niche use case (long script to parse DOM output of web pages).
Also they just updated 4o recently, it's even better now. o3-mini-high is solid as well, I try it when 4o fails.
One issue I have with most models is that when they're re-writing my long scripts, they tend to forget to keep a few lines or variables here or there. Makes for some really frustrating debugging. o1 has actually been pretty decent here so far. I'm definitely a bit of a power user, I really try to push the models to do as much as possible regarding long software contexts.