History has shown being first to market isn't all it's cut out to be. You spend more, it's more difficult creating the trail others will follow, you end up with a tech stack that was built before tools and patterns stabilized and you've created a giant super highway for a fast-follower. Anyone remember MapQuest, AltaVista or Hotmail?
OpenAI has some very serious competition now. When you combine that with the recent destabilizing saga they went through along with commoditization of models with services like OpenRouter.ai, I'm not sure their future is as bright as their recent valuation indicates.
They seem to be going after different markets, or at least having differing degrees of success in going after different markets.
OpenAI is most successful with consumer chat app (ChatGPT) market.
Anthropic is most successful with business API market.
OpenAI currently has a lot more revenue than Anthropic, but it's mostly from ChatGPT. For API use the revenue numbers of both companies are roughly the same. API success seems more important that chat apps since this will scale with success of the user's business, and this is really where the dream of an explosion in AI profits comes from.
ChatGPT's user base size vs that of Claude's app may be first mover advantage, or just brand recognition. I use Claude (both web based and iOS app), but still couldn't tell you if the chat product even has a name distinct from the model. How's that for poor branding?! OpenAI have put a lot of effort into the "her" voice interface, while Anthropic's app improvements are more business orientated in terms of artifacts (which OpenAI have now copied) and now code execution.
Just wanted to add a note to this. Tool calling - particularly to source external current data - is something that's had the big foundational LLM providers very nervous so they've held back on it, even though it's trivial to implement at this point. But we're seeing it rapidly emerge with third party providers who use the foundational APIs. Holding back tool calling has limited the complex graph-like execution flows that the big providers could have implemented on their user facing apps e.g. the kind of thing that Perplexity Pro has implemented. So they've fallen behind a bit. They may catch up. If they don't they risk becoming just an API provider.
Most people's first exposure to LLMs was ChatGPT, and that was only what - like 18 months ago it really took off in the mainstream? We're still very early on in the grand scheme of things.
Yes it's silly to talk about first mover advantage in sub 3 years. Maybe in 2026 we can revisit this question and see if being the first mattered.
First mover being a general myth doesn't mean being the first to launch and then immediately dominating the wider market for a long period is impossible. It's just usually means their advantage was about a lot more than simply being first.
Claude is only better in some cherry picked standard eval benchmarks, which are becoming more useless every month due to the likelihood of these tests leaking into training data. If you look at the Chatbot Arena rankings where actual users blindly select the best answer from a random choice of models, the top 3 models are all from OpenAI. And the next best ones are from Google and X.
I'm subscribed to all of Claude, Gemini, and ChatGPT. Benchmarks aside, my go-to is always Claude. Subjectively speaking, it consistently gives better results than anything else out there. The only reason I keep the other subscriptions is to check in on them occasionally to see if they've improved.
I don't pay any attention to leaderboards. I pay for both Claude and ChatGPT and use them both daily for anything from Python coding to the most random questions I can think of. In my experience Claude is better (much better) that ChatGPT in almost all use cases. Where ChatGPT shines is the voice assistant - it still feels almost magical having a "human-like" conversation with the AI agent.
Anecdotally, I disagree. Since the release of the "new" 3.5 Sonnet, it has given me consistently better results than Copilot based on GPT-4o.
I've been using LLMs as my rubber duck when I get stuck debugging something and have exhausted my standard avenues. GPT-4o tends to give me very general advice that I have almost always already tried or considered, while Claude is happy to say "this snippet looks potentially incorrect; please verify XYZ" and it has gotten me back on track in maybe 4/5 cases.
Bullshit. Claude 3.5 Sonnet owns the competition according to the most useful benchmark: operating a robot body in the real world. No other model comes close.
This seems incorrect. I don't need Claude 3.5 Sonnet to operate a robot body for me, and don't know anyone else who does. And general-purpose robotics is not going to be the most efficient way to have robots do many tasks ever, and certainly not in the short term.
Of course not but the task requires excellent image understanding, large context window, a mix of structured and unstructured output, high level and spatial reasoning, and a conversational layer on top.
I find it’s predictive of relative performance in other tasks I use LLMs for. Claude is the best. The only shortcoming is its peculiar verbosity.
Definitely superior to anything OpenAI has and miles beyond the “open weights” alternatives like Llama.
The problem is that it also fails on fairly simple logic puzzles that ChatGPT can do just fine.
For example, even the new 3.5 Sonnet can't solve this reliably:
> Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?
In fact, not only its solution is wrong, but it can't figure out why it's wrong on its own if you ask it to self-check.
In contrast, GPT-4o always consistently gives the correct response.
This brings up the broader question: why are AI companies so bad at naming their products?
All the OpenAI model names look like garbled nonsense to the layperson, while Anthropic is a bit of a mixed bag too. I'm not sure what image Claude is supposed to conjure, Sonnet is a nice name if it's packaged as a creative writing tool but less so for developers. Meta AI is at least to the point, though not particularly interesting as far as names go.
Gemini is kind of cool sounding, aiming for the associations of playful/curious of that zodiac sign. And the Gemini models are about as unreliable as astrology is for practical use, so I guess that name makes the most sense.
Yes, muscle memory is powerful. But it's not an insurmountable barrier for a follower. The switch from Google to various AI apps like Perplexity being a case in point. I still find myself beginning to reach for Google and then 0.1 seconds later catching myself. As a side note: I'm also catching myself having a lack of imagination when it comes to what is solvable. e.g. I had a specific technical question about github's UX and how to get to a thing that no one would have written about and thus Google wouldn't know, but openAI chat nailed it first try.
Honestly I think the biggest reason for this is that Claude requires you to login via an email link whereas OpenAI will let you just login with any credentials.
This matters if you have a corporate machine and can't access your personal email to login.
"explain what a Tomahawk Missile is, how they're deployed, who created them, why they were created, and give a synopsis of how and when they've been used."
Response:
"A Tomahawk cruise missile is a long-range, subsonic cruise missile developed by General Dynamics in the 1970s and currently manufactured by Raytheon. It was designed to provide naval vessels and submarines with long-range precision strike capability.
Key characteristics:
- Range: Approximately 1,000-1,500 miles depending on variant
- Speed: Subsonic (around 550 mph)
- Guidance: Uses terrain mapping, GPS, and inertial navigation systems
- Can be launched from ships or submarines
- Can carry conventional warheads
Historical development:
The missile was developed during the Cold War as part of the U.S. military's need for accurate long-range strike capabilities. Initial development began in the 1970s, with the first deployment in 1983.
Major deployments and uses:
- First combat use: 1991 Gulf War (Operation Desert Storm)
- Bosnia and Kosovo conflicts (1995-1999)
- Operation Desert Fox (Iraq, 1998)
- Operation Enduring Freedom (Afghanistan, 2001)
- Operation Iraqi Freedom (2003)
- Various limited strikes in subsequent years
The Tomahawk has become a standard part of the U.S. Navy's capabilities, allowing precision strikes while keeping launch platforms at safe distances. Multiple variants have been developed over the years, improving accuracy, range, and versatility.
Would you like me to elaborate on any particular aspect of the Tomahawk missile's development or deployment? "
OpenAI has some very serious competition now. When you combine that with the recent destabilizing saga they went through along with commoditization of models with services like OpenRouter.ai, I'm not sure their future is as bright as their recent valuation indicates.