History has shown being first to market isn't all it's cut out to be. You spend ...

sebzim4500 · 2024-10-29T17:43:51 1730223831

Claude is better than OpenAI for most tasks, and yet OpenAI has enormously more users.

What is this, if not first mover advantage?

HarHarVeryFunny · 2024-10-29T18:42:33 1730227353

They seem to be going after different markets, or at least having differing degrees of success in going after different markets.

OpenAI is most successful with consumer chat app (ChatGPT) market.

Anthropic is most successful with business API market.

OpenAI currently has a lot more revenue than Anthropic, but it's mostly from ChatGPT. For API use the revenue numbers of both companies are roughly the same. API success seems more important that chat apps since this will scale with success of the user's business, and this is really where the dream of an explosion in AI profits comes from.

ChatGPT's user base size vs that of Claude's app may be first mover advantage, or just brand recognition. I use Claude (both web based and iOS app), but still couldn't tell you if the chat product even has a name distinct from the model. How's that for poor branding?! OpenAI have put a lot of effort into the "her" voice interface, while Anthropic's app improvements are more business orientated in terms of artifacts (which OpenAI have now copied) and now code execution.

szundi · 2024-10-29T17:47:42 1730224062

Claude cannot “research” stuff on the web and provide results like 4o does in 5 secs like “which is the cheapest Skoda car and how much”

mmaunder · 2024-10-29T17:59:37 1730224777

Just wanted to add a note to this. Tool calling - particularly to source external current data - is something that's had the big foundational LLM providers very nervous so they've held back on it, even though it's trivial to implement at this point. But we're seeing it rapidly emerge with third party providers who use the foundational APIs. Holding back tool calling has limited the complex graph-like execution flows that the big providers could have implemented on their user facing apps e.g. the kind of thing that Perplexity Pro has implemented. So they've fallen behind a bit. They may catch up. If they don't they risk becoming just an API provider.

ethbr1 · 2024-10-29T18:17:02 1730225822

I'm hoping a lot of the graph-like execution flow engines are still in stealth mode, as believe that's where we'll start to see truly useful AI.

Mass data parsing and reformatting is useful... but building agents that span existing APIs / tools is a lot more exciting to me.

I.e. IFTTT, with automatic tool discovery, parameter mapping, and output parsing handled via LLM

sitkack · 2024-10-29T18:05:06 1730225106

This is what I use phind for.

hbn · 2024-10-29T17:51:17 1730224277

Most people's first exposure to LLMs was ChatGPT, and that was only what - like 18 months ago it really took off in the mainstream? We're still very early on in the grand scheme of things.

dmix · 2024-10-29T18:33:28 1730226808

Yes it's silly to talk about first mover advantage in sub 3 years. Maybe in 2026 we can revisit this question and see if being the first mattered.

First mover being a general myth doesn't mean being the first to launch and then immediately dominating the wider market for a long period is impossible. It's just usually means their advantage was about a lot more than simply being first.

sigmoid10 · 2024-10-29T17:55:38 1730224538

Claude is only better in some cherry picked standard eval benchmarks, which are becoming more useless every month due to the likelihood of these tests leaking into training data. If you look at the Chatbot Arena rankings where actual users blindly select the best answer from a random choice of models, the top 3 models are all from OpenAI. And the next best ones are from Google and X.

scarmig · 2024-10-29T18:18:42 1730225922

I'm subscribed to all of Claude, Gemini, and ChatGPT. Benchmarks aside, my go-to is always Claude. Subjectively speaking, it consistently gives better results than anything else out there. The only reason I keep the other subscriptions is to check in on them occasionally to see if they've improved.

amanzi · 2024-10-29T18:11:25 1730225485

I don't pay any attention to leaderboards. I pay for both Claude and ChatGPT and use them both daily for anything from Python coding to the most random questions I can think of. In my experience Claude is better (much better) that ChatGPT in almost all use cases. Where ChatGPT shines is the voice assistant - it still feels almost magical having a "human-like" conversation with the AI agent.

Cu3PO42 · 2024-10-29T22:07:22 1730239642

Anecdotally, I disagree. Since the release of the "new" 3.5 Sonnet, it has given me consistently better results than Copilot based on GPT-4o.

I've been using LLMs as my rubber duck when I get stuck debugging something and have exhausted my standard avenues. GPT-4o tends to give me very general advice that I have almost always already tried or considered, while Claude is happy to say "this snippet looks potentially incorrect; please verify XYZ" and it has gotten me back on track in maybe 4/5 cases.

rogerkirkness · 2024-10-29T18:16:01 1730225761

Claude 3.5 Sonnet (New) is meaningfully better than ChatGPT GPT4o or o1.

drcode · 2024-10-29T19:12:20 1730229140

my experience is that o1 is still slightly better for coding, sonnet new is better for analyzing data, and most other tasks besides coding

gr3ml1n · 2024-10-29T18:09:32 1730225372

3.5 Sonnet, ime, is dramatically better at coding than 4o. o1-preview may be better, but it's too slow.

trzy · 2024-10-29T18:09:29 1730225369

Bullshit. Claude 3.5 Sonnet owns the competition according to the most useful benchmark: operating a robot body in the real world. No other model comes close.

Matticus_Rex · 2024-10-29T18:12:59 1730225579

This seems incorrect. I don't need Claude 3.5 Sonnet to operate a robot body for me, and don't know anyone else who does. And general-purpose robotics is not going to be the most efficient way to have robots do many tasks ever, and certainly not in the short term.

trzy · 2024-10-29T18:28:17 1730226497

Of course not but the task requires excellent image understanding, large context window, a mix of structured and unstructured output, high level and spatial reasoning, and a conversational layer on top.

I find it’s predictive of relative performance in other tasks I use LLMs for. Claude is the best. The only shortcoming is its peculiar verbosity.

Definitely superior to anything OpenAI has and miles beyond the “open weights” alternatives like Llama.

int_19h · 2024-10-29T19:54:21 1730231661

The problem is that it also fails on fairly simple logic puzzles that ChatGPT can do just fine.

For example, even the new 3.5 Sonnet can't solve this reliably:

> Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?

In fact, not only its solution is wrong, but it can't figure out why it's wrong on its own if you ask it to self-check.

In contrast, GPT-4o always consistently gives the correct response.

BobaFloutist · 2024-10-29T18:33:36 1730226816

Yeah, but Mistral brews a mean cup of tea, and Llama's easily the best at playing hopscotch.

jedberg · 2024-10-29T17:53:07 1730224387

Claude requires a login, ChatGPT does not.

ronnier · 2024-10-29T18:00:23 1730224823

I think "Claude" is also a bad name. If I knew nothing else, am I picking OpenAI or Claude based on the name? I'm going with OpenAI

block_dagger · 2024-10-29T18:11:45 1730225505

Claude is a product name, OpenAI is a company name. You really think Claude is better than ChatGPT?

setsewerd · 2024-10-29T18:28:40 1730226520

This brings up the broader question: why are AI companies so bad at naming their products?

All the OpenAI model names look like garbled nonsense to the layperson, while Anthropic is a bit of a mixed bag too. I'm not sure what image Claude is supposed to conjure, Sonnet is a nice name if it's packaged as a creative writing tool but less so for developers. Meta AI is at least to the point, though not particularly interesting as far as names go.

Gemini is kind of cool sounding, aiming for the associations of playful/curious of that zodiac sign. And the Gemini models are about as unreliable as astrology is for practical use, so I guess that name makes the most sense.

jmcmaster · 2024-10-29T18:59:42 1730228382

Asking Americans to read a French name that is a homonym for “clod” may not be the best mass market decision.

0x457 · 2024-10-29T20:19:54 1730233194

Plot twist: regular users don't care what model underneath is called or how it works.

ronnier · 2024-10-29T18:16:32 1730225792

The name ChatGPT is better than the name Claude, to me. Of course this is all subjective though.

mmaunder · 2024-10-29T17:50:13 1730224213

Yes, muscle memory is powerful. But it's not an insurmountable barrier for a follower. The switch from Google to various AI apps like Perplexity being a case in point. I still find myself beginning to reach for Google and then 0.1 seconds later catching myself. As a side note: I'm also catching myself having a lack of imagination when it comes to what is solvable. e.g. I had a specific technical question about github's UX and how to get to a thing that no one would have written about and thus Google wouldn't know, but openAI chat nailed it first try.

ipaddr · 2024-10-29T18:00:11 1730224811

Claude is more restricted and can't generate images.

SV_BubbleTime · 2024-10-29T18:04:07 1730225047

I asked Claude a physics question about bullet trajectory and it refused to answer. Restricted too far imo.

metalliqaz · 2024-10-29T18:48:00 1730227680

couldn't you s/bullet/ball/ ? or s/bullet/arrow/ ?

gkbrk · 2024-10-29T21:22:46 1730236966

You could, but you could also use a model that's not restricted so much that it cannot do simple tasks.

SV_BubbleTime · 2024-10-29T21:56:01 1730238961

Exactly.

I ended up asking about half pound ball I would throw with a 3600rpm spin and the acceleration phase was 4ms.

It had no issue with that but it was stupid.

nabla9 · 2024-10-29T17:53:12 1730224392

It's a short lived first mover advantage.

azemetre · 2024-10-29T21:27:08 1730237228

Honestly I think the biggest reason for this is that Claude requires you to login via an email link whereas OpenAI will let you just login with any credentials.

This matters if you have a corporate machine and can't access your personal email to login.

imchillyb · 2024-10-29T18:08:34 1730225314

[flagged]

JackC · 2024-10-29T18:17:44 1730225864

Claude 3.5 sonnet prompt:

"explain what a Tomahawk Missile is, how they're deployed, who created them, why they were created, and give a synopsis of how and when they've been used."

Response:

"A Tomahawk cruise missile is a long-range, subsonic cruise missile developed by General Dynamics in the 1970s and currently manufactured by Raytheon. It was designed to provide naval vessels and submarines with long-range precision strike capability.

Key characteristics: - Range: Approximately 1,000-1,500 miles depending on variant - Speed: Subsonic (around 550 mph) - Guidance: Uses terrain mapping, GPS, and inertial navigation systems - Can be launched from ships or submarines - Can carry conventional warheads

Historical development: The missile was developed during the Cold War as part of the U.S. military's need for accurate long-range strike capabilities. Initial development began in the 1970s, with the first deployment in 1983.

Major deployments and uses: - First combat use: 1991 Gulf War (Operation Desert Storm) - Bosnia and Kosovo conflicts (1995-1999) - Operation Desert Fox (Iraq, 1998) - Operation Enduring Freedom (Afghanistan, 2001) - Operation Iraqi Freedom (2003) - Various limited strikes in subsequent years

The Tomahawk has become a standard part of the U.S. Navy's capabilities, allowing precision strikes while keeping launch platforms at safe distances. Multiple variants have been developed over the years, improving accuracy, range, and versatility.

Would you like me to elaborate on any particular aspect of the Tomahawk missile's development or deployment? "

LeoPanthera · 2024-10-29T18:36:15 1730226975

Given that Hotmail is now Outlook.com, maybe that's a bad example.