- Gemini 2.5 pro is absolutely a beast in coding, perhaps the best model right now
- They spent all the computing resources on training it on coding data and forgot to give it a distinct personality.
- It doesn’t do well on reasoning as well as Grok 3 (think) and Claude 3.7 Sonnet (thinking)
- On par with 03-mini-high in general mathematics
If you’re a coder, you’ll love it, or else you will be fine with other frontier reasoning models (Deepseek r1, if you ask me)
Deepseek v3 0324 is the first open-source model to match SOTA coding performance
- Understands user intention better than before; I’d say it’s better than Claude 3.7 Sonnet base and thinking. 3.5 is still better at this (perhaps the best)
- Again, in raw quality code generation, it is better than 3.7, on par with 3.5, and sometimes better.
- Great at reasoning, much better than any non-reasoning models available now.
- Better at the instruction following than 3,7 Sonnet but below 3.5 Sonnet.
This is a step towards a human-machine hybrid world. Putting a human in the loop can do wonders. Sure, it is expensive now, but the subsequent iterations will crush it.
Have you heard of Centaur chess? A human and a machine would team up to find the best chess moves against another similar team. It's not a thing anymore. Computers have advanced so much that humans can't really contribute in any meaningful sense.
All these AI models do quite well in games because there are set rules, finite moves, and they can iterate in a tight loop (without humans) to get immediate feedback on pass/fail.
I think this is what differentiates the speed at which AIs have gotten from ok -> good -> great -> better than humans at say chess, versus say driving a car, summarizing a paper, understanding human requests, recommending music, etc.
I think a lot of people are extrapolating the rate of progress & possible accuracy rates from chess bots to domains that do not compare.
Is the point of your comment to make people feel depressed ?
Either we're going to use these tools to augment our abilities or basically just become wiped out, at least our jobs will be, and there is no plan to provide support for anyone. Maybe the tech will make the transition to a post employment world so swift we don't even feel any negative economic effects at all, but let's see.
Depressing hasn’t been the reality for the majority of people over the last 100 years of technological progress. You could die from a scratch or a kidney stone 100s ago.
Once we realize we can make machines that can beat us in ways we can’t even understand, I wonder if will question if we have always been influenced this way by an exterior force