Hacker News new | past | comments | ask | show | jobs | submit login

These companies full of brilliant engineers are throwing millions of dollars in training costs to produce SOTA models that are... "on par with GPT-4o and Claude Opus"? And then the next 2.23% bump will cost another XX million? It seems increasingly apparent that we are reaching the limits of throwing more data at more GPUs; that an ARC prize level breakthrough is needed to move the needle any farther at this point.



> It seems increasingly apparent that we are reaching the limits of throwing more data at more GPUs

Yes. This is exactly why I'm skeptical of AI doomerism/saviorism.

Too many people have been looking at the pace of LLM development over the last two (2) years, modeled it as an exponential growth function, and come to the conclusion that AGI is inevitable in the next ${1-5} years and we're headed for ${(dys|u)topia}.

But all that assumes that we can extrapolate a pattern of long-term exponential growth from less than two years of data. It's simply not possible to project in that way, and we're already seeing that OpenAI has pivoted from improving on GPT-4's benchmarks to reducing cost, while competitors (including free ones) catch up.

All the evidence suggests we've been slowing the rate of growth in capabilities of SOTA LLMs for at least the past year, which means predictions based on exponential growth all need to be reevaluated.


Notice though, that all these improvements have been with pretty basic transformer models that output all their tokens-- no internal thoughts, no search, no architecture improvements and things are only fed through them once.

But we could add internal thoughts-- we could make the model generate tokens that aren't part of its output but are there for it to better figure out its next token. This was tried QuietSTAR.

Hochreiter is also active with alternative models, and there's all the microchip design companies, Groq, Etched, etc. trying to speed up models and reduce model running cost.

Therefore, I think there's room for very great improvements. They may not come right away, but there are so many obvious paths to improve things that I think it's unreasonable to think progress has stalled. Also, presumably GPT-5 isn't far away.


> But we could add internal thoughts

It feels like there’s an assumption in the community that this will be almost trivial.

I suspect it will be one of the hardest tasks humanity has ever endeavoured. I’m guessing it has already been tried many times in internal development.

I suspect if you start creating a feedback loop with these models they will tend to become very unstable very fast. We already see with these more linear LLMs that they can be extremely sensitive to the values of parameters like the temperature settings, and can go “crazy” fairly easily.

With feedback loops it could become much harder to prevent these AIs from spinning out of control. And no I don’t mean in the “become an evil paperclip maximiser” kind of way. Just plain unproductive insanity.

I think I can summarise my vision of the future in one sentence: AI psychologists will become a huge profession, and it will be just as difficult and nebulous as being a human psychologist.


I personally think it's not going to be incredibly difficult. Obviously, the way it was done with QuietSTaR is somewhat expensive, but I see many reasonable approaches here that could be considered.

High temperature will obviously lead to randomness, that's what it, evening out the probabilities of the possibilities for the next token. So obviously a high temperature will make them 'crazy' and low temperature will lead to deterministic output. People have come up with lots of ideas about sampling, but this isn't really an instability of transformer models.

It's a problem with any model outputing probabilities for different alternative tokens.


>I suspect if you start creating a feedback loop with these models they will tend to become very unstable very fast. We already see with these more linear LLMs that they can be extremely sensitive to the values of parameters like the temperature settings, and can go “crazy” fairly easily.

I'm in the process of spinning out one of these tools into a product: they do not. They become smarter at the price of burning GPU cycles like there's no tomorrow.

I'd go as far as saying we've solved AGI, it's just that the energy budget is larger than the energy budget of the planet currently.


can you link to the overall approach or references for your work?


> Also, presumably GPT-5 isn't far away.

Why do we presume that? People were saying this right before 4o and then what came out was not 5 but instead a major improvement on cost for 4.

Is there any specific reason to believe OpenAI has a model coming soon that will be a major step up in capabilities?


OpenAI have made statements saying they've begun training it, as they explain here: https://openai.com/index/openai-board-forms-safety-and-secur...

I assume that this won't take forever, but will be done this year. A couple of months, not more.


Indeed.All exponential growth curves are sigmoids in disguise.


This is something that is definitionally true in a finite universe, but doesn't carry a lot of useful predictive value in practice unless you can identify when the flattening will occur.

If you have a machine that converts mass into energy and then uses that energy to increase the rate at which it operates, you could rightfully say that it will level off well before consuming all of the mass in the universe. You just can't say that next week after it has consumed all of the mass of Earth.


except when it isn't and we ded :P


I don't think Special Relativity would allow that.


I'm also wondering about the extent to which we are simply burning venture capital versus actually charging subscription prices that are sustainable long-term. Its easy to sell dollars for $0.75 but you can only do that for so long.


> we're already seeing that OpenAI has pivoted from improving on GPT-4's benchmarks to reducing cost, while competitors (including free ones) catch up.

What if they have two teams? One dedicated to optimizing (cost, speed, etc) the current model and a different team working on the next frontier model? I don't think we know the growth curve until we see gpt5.


> I don't think we know the growth curve until we see gpt5.

I'm prepared to be wrong, but I think that the fact that we still haven't seen GPT-5 or even had a proper teaser for it 16 months after GPT-4 is evidence that the growth curve is slowing. The teasers that the media assumed were for GPT-5 seem to have actually been for GPT-4o [0]:

> Lex Fridman(01:06:13) So when is GPT-5 coming out again?

> Sam Altman(01:06:15) I don’t know. That’s the honest answer.

> Lex Fridman(01:06:18) Oh, that’s the honest answer. Blink twice if it’s this year.

> Sam Altman(01:06:30) We will release an amazing new model this year. I don’t know what we’ll call it.

> Lex Fridman(01:06:36) So that goes to the question of, what’s the way we release this thing?

> Sam Altman(01:06:41) We’ll release in the coming months many different things. I think that’d be very cool. I think before we talk about a GPT-5-like model called that, or not called that, or a little bit worse or a little bit better than what you’d expect from a GPT-5, I think we have a lot of other important things to release first.

Note that last response. That's not the sound of a CEO who has an amazing v5 of their product lined up, that's the sound of a CEO who's trying to figure out how to brand the model that they're working on that will be cheaper but not substantially better.

[0] https://arstechnica.com/information-technology/2024/03/opena...


I don't think we are approaching limits, if you take off the English-centric glasses. You can query LLMs about pretty basic questions about Polish language or literature and it's gonna either bullshit or say it doesn't know the answer.

Example:

    w której gwarze jest słowo ekspres i co znaczy?

    Słowo "ekspres" występuje w gwarze śląskiej i oznacza tam ekspres do kawy. Jest to skrót od nazwy "ekspres do kawy", czyli urządzenia służącego do szybkiego przygotowania kawy.
The correct answer is that "ekspres" is a zipper in Łódź dialect.


What this means is just that Polish support (and probably most other languages besides English) in the models is behind SOTA. We can gradually get those languages closer to SOTA, but that doesn't bring us closer to AGI.


That's just same same but different, not a step change towards significant cognitive ability.


Tbf, you can ask it basic questions in English and it will also bullshit you.


What about synthetic data?


I suspect this is why OpenAI is going more in the direction of optimising for price / latency / whatever with 4o-mini and whatnot. Presumably they found out long before the rest of us did that models can't really get all that much better than what we're approaching now, and once you're there the only thing you can compete on is how many parameters it takes and how cheaply you can serve that to users.


Meta just claimed the opposite in their Llama 3.1 paper. Look at the conclusion. They say that their experience indicates significant gains for the next iteration of models.

The current crop of benchmarks might not reflect these gains, by the way.


I sell widgets. I promise the incalculable power of widgets has yet to be unleashed on the world, but it is tremendous and awesome and we should all be very afraid of widgets taking over the world because I can't see how they won't.

Anyway here's the sales page. the widget subscription is so premium you won't even miss the subscription fee.


This. It's really weird the way we suddenly live in a world where it's the norm to take whatever a tech company says about future products at face value. This is the same world where Tesla promised "zero intervention LA to NYC self driving" by the end of the year in 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, and 2024. The same world where we know for a fact that multiple GenAI demos by multiple companies were just completely faked.

It's weird. In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies, even if they have nap pods in the office and have their first day employees wear funny hats. Then ChatGPT lands and everyone is back to fully trusting these companies when they say they are mere months from turning the world upside down with their AI, which they say every month for the last 12-24 months.


I'm not sure anyone is asking you to take it at face value or implicitly trust them? There's a 92-page paper with details: https://ai.meta.com/research/publications/the-llama-3-herd-o...


> In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies

In the 2000s we only had Microsoft, and none of us were confused as to whether to trust Bill Gates or not...


Nobody tells it like Zitron:

https://www.wheresyoured.at/pop-culture/

> What makes this interview – and really, this paper — so remarkable is how thoroughly and aggressively it attacks every bit of marketing collateral the AI movement has. Acemoglu specifically questions the belief that AI models will simply get more powerful as we throw more data and GPU capacity at them, and specifically ask a question: what does it mean to "double AI's capabilities"? How does that actually make something like, say, a customer service rep better? And this is a specific problem with the AI fantasists' spiel. They heavily rely on the idea that not only will these large language models (LLMs) get more powerful, but that getting more powerful will somehow grant it the power to do...something. As Acemoglu says, "what does it mean to double AI's capabilities?"


I don't think claiming that pure scaling of LLMs isn't going to lead to AGI is a particularly hot take. Or that current LLMs don't provide a whole lot of economic value. Obviously, if you were running a research lab you'd be trying a bunch of different things, including pure scaling. It would be weird not to. I don't know if we're going to hit actual AGI in the next decade, but given the progress of the last less-than-decade I don't see why anyone would rule it out. That in itself seems pretty remarkable, and it's not hard to see where the hype is coming from.


Meta just keeps releasing their models as open-source, so that whole line of thinking breaks down quickly.


That line of thinking would not have reached the conclusion that you imply, which is that open source == pure altruism. Having the benefit of hindsight, it’s very difficult for me to believe that. Who knows though!

I’m about Zucks age, and have been following his career/impact since college; it’s been roughly a cosine graph of doing good or evil over time :) I think we’re at 2pi by now, and if you are correct maybe it hockey-sticks up and to the right. I hope so.


I don't think this is a matter of good or evil, simply a matter of business strategy.

If LLMs end up being the platform of the future, Zuck doesn't want OpenAI/Microsoft to be able to monopolize it.


Wouldn't the equivalent for Meta actually be something like:

> Other companies sell widgets. We have a bunch of widget-making machines and so we released a whole bunch of free widgets. We noticed that the widgets got better the more we made and expect widgets to become even better in future. Anyway here's the free download.

Given that Meta isn't actually selling their models?

Your response might make sense if it were to something OpenAI or Anthropic said, but as is I can't say I follow the analogy.


that would make sense if it was from Openai, but Meta doesn't actually sell these widgets? They release the widget machines for free in the hopes that other people will build a widget ecosystem around them to rival the closed widget ecosystem that threatens to lock them out of a potential "next platform" powered by widgets.


Meta doesn't sell widgets in this scenario - they give them away for free. Their competition sells widgets, so Meta would be perfectly happy if the widget market totally collapsed.


That is strong (and fun) point, but this is peer reviewable and has more open collaboration elements than purely selling widgets.

We should still be skeptical because often want to claim to be better or have unearned answers, but I don't think the motive to lie is quite as strong as a salesman's.


> this is peer reviewable

It's not peer-reviewable in any shape or form.


Others can build models that try to have decent performance with a lower number of parameters. If they match what is in the paper that is the crudest form of review, but Mistral is releasing some models (this one?) so this can get more nuanced if needs.

That said, doing that is slow and people will need to make decisions before that is done.


So, the best you can do is "the crudest form of review"?


It is kind of "peer-reviewable" in the "Elon Musk vs Yann LeCun" form, but I doubt that the original commenter meant this.


Except: Meta doesn't sell AI at all. Zuck is just doing this for two reasons:

- flex

- deal a blow to Altmann


Meta uses ai in all the recommendation algorithms. They absolutely hope to turn their chat assistants into a product on WhatsApp too, and GenAI is crucial to creating the metaverse. This isn’t just a charity case.


AI isn't a single thing: of course meta didn't buy thousands of GPUs for fun.

But it has nothing to do with LLMs (and interestingly enough they aren't opening their recommendation tech).


There are literal ads for Meta Ai on television. The idea they’re not selling something is absurd.


If OpenAI was saying this you'd have a point but I wouldn't call Facebook a widget seller in this case when they're giving their widgets away for free.


But Meta isn't selling it


They also said in the paper that 405B was only trained to "compute-optimal" unlike the smaller models that were trained well past that point indicating the larger model still had some runway so had they continued it would have kept getting stronger.


Makes sense right? Otherwise why make a model so large that nobody can conceivably run it if not to optimize for performance on a limited dataset/compute? It was always a distillation source model, not a production one.


LLMs are reaching saturation on even some of the latest benchmarks and yet I am still a little disappointed by how they perform in practice.

They are by no means bad, but I am now mostly interested in long context competency. We need benchmarks that force the LLM to complete multiple tasks simultaneously in one super long session.


I don't know anything about AI but there's one thing I want it to do for me. Program a full body exercise program long term based on the parameters I give it such as available equipment and past workout context goals. I haven't had good success with chatgpt but I assume what you're talking about is relevant to my goals.


Aren't there apps that already do this like Fitbod?


Fitbod might do the trick. Thanks! The availability of equipment was a difficult thing for me to incorporate into a fitness program.


Yeah, but what does that actually mean? That if they had simply doubled the parameters on Llama 405b it would score way better on benchmarks and become the new state-of-the-art by a long mile?

I mean, going by their own model evals on various benchmarks (https://llama.meta.com/), Llama 405b scores anywhere from a few points to almost 10 points more than than Llama 70b even though the former has ~5.5x more params. As far as scale in concerned, the relationship isn't even linear.

Which in most cases makes sense, you obviously can't get a 200% on these benchmarks, so if the smaller model is already at ~95% or whatever then there isn't much room for improvement. There is, however, the GPQA benchmark. Whereas Llama 70b scores ~47%, Llama 405b only scores ~51%. That's not a huge improvement despite the significant difference in size.

Most likely, we're going to see improvements in small model performance by way of better data. Otherwise though, I fail to see how we're supposed to get significantly better model performance by way of scale when the relationship between model size and benchmark scores is nowhere near linear. I really wish someone who's team "scale is all you need" could help me see what I'm missing.

And of course we might find some breakthrough that enables actual reasoning in models or whatever, but I find that purely speculative at this point, anything but inevitable.


Or maybe they just want to avoid getting sued by shareholders for dumping so much money into unproven technology that ended up being the same or worse than the competitor


> the only thing you can compete on is how many parameters it takes and how cheaply you can serve that to users.

The problem with this strategy is that it's really tough to compete with open models in this space over the long run.

If you look at OpenAI's homepage right now they're trying to promote "ChatGPT on your desktop", so it's clear even they realize that most people are looking for a local product. But once again this is a problem for them because open models run locally are always going to offer more in terms of privacy and features.

In order for proprietary models served through an API to compete long term they need to offer significant performance improvements over open/local offerings, but that gap has been perpetually shrinking.

On an M3 macbook pro you can run open models easily for free that perform close enough to OpenAI that I can use them as my primary LLM for effectively free with complete privacy and lots of room for improvement if I want to dive into the details. Ollama today is pretty much easier to install than just logging into ChatGPT and the performance feels a bit more responsive for most tasks. If I'm doing a serious LLM project I most certainly won't use proprietary models because the control I have over the model is too limited.

At this point I have completely stopped using proprietary LLMs despite working with LLMs everyday. Honestly can't understand any serious software engineer who wouldn't use open models (again the control and tooling provided is just so much better), and for less technical users it's getting easier and easier to just run open models locally.


In the long run maybe but it's going to take probably 5 years or more before laptops such as Macbook M3 with 64 GB RAM will be mainstream. Also it's going going to take a while before such models with 70B params will be bundled in Windows and Mac with system update. Even more time before you will have such models inside your smartphone.

OpenAI did a good move with making GPTo mini so dirty cheap that it's faster and cheaper to run than LLama 3.1 70B. Most consumers will interact with LLM via some apps using LLM API, Web Panel on desktop or native mobile app for the same reason most people use GMail etc. instead of native email client. Setting up IMAP, POP etc is for most people out of reach the same like installing Ollama + Docker + OpenWebUI

App developers are not gonna bet on local LLM only as long they are not mainstream and preinstalled on 50%+ devices.


I think their desktop app still runs the actual LLM queries remotely.


This. It's a mac port of the iOS app. Using the API.


Totally. I wrote about this when they announced their dev-day stuff.

In my opinion, they've found that intelligence with current architecture is actually an S-curve and not an exponential, so trying to make progress in other directions: UX and EQ.

https://nicholascharriere.com/blog/thoughts-openai-spring-re...


indeed. I pointed out in https://buttondown.email/ainews/archive/ainews-llama-31-the-... that the frontier model curve is currently going down 1 OoM every 4 months, meaning every model release has a very short half life[0]. however this progress is still worth it if we can deploy it to improve millions and eventually billions of people's lives. a commenter pointed out that the amoutn spent on Llama 3.1 was only like 60% of the cost of Ant Man and the Wasp Quantumania, in which case I'd advocate for killing all Marvel slop and dumping all that budget on LLM progress.

[0] not technically complete depreciation, since for example 4o mini is widely believed to be a distillation of 4o, so 4o's investment still carries over into 4o mini


All that Marvel slop was created by the first real LLM: <https://marvelcinematicuniverse.fandom.com/wiki/K.E.V.I.N.>


> however this progress is still worth it if we can deploy it to improve millions and eventually billions of people's lives

Has there been any indication that we're improving the lives of millions of people?


Just me coding 30% faster is worth it


I haven't found a single coding problem where any of these coding assistants where anything but annoying.

If I need to babysit a junior developer fresh out of school and review every single line of code it spits out, I can find them elsewhere


Yes, just like internet, power users have found use cases. It'll take education / habit for general users


Ah yes. We're in the crypto stages of "it's like the internet".


Agreed on everything, but calling the marvel movies slop…I think that word has gone too far.


Not all Marvel films are slop. But, as a fan who comes from a family of fans and someone who has watched almost all of them; lets be real. That particular film, really and most of them, contain copious amounts of what is absolutely slop.

I don't know if the utility is worse than an LLM that is SOTA for 2 months that no one even bothers switching to however - at least the marvel slop is being used for entertainment by someone. I think the market is definitely prioritizing the LLM researcher over Disney's latest slop sequel though so whoever made that comparison can rest easy, because we'll find out.


>really and most of them, contain copious amounts of what is absolutely slop.

I thought that was the allure, something that's camp funny and an easy watch.

I have only watched a few of them so I am not fully familiar?


Not only are Marvel movies slop, they are very concentrated slop. The only way to increase the concentration of slop in a Marvel movie would be to ask ChatGPT to write the next one.


The marvel movies are the genesis for this use of the word slop.


Can you back that claim up with a link or similar?


It’s junk food. No one is disputing how tasty it is though (including the recent garbage).


The thing I don't understand is why everyone is throwing money at LLMs for language, when there are much simpler use cases which are more useful?

For example, has anyone ever attempted image -> html/css model? Seems like it be great if I can draw something on a piece of paper and have it generate a website view for me.


Perhaps if we think of LLMs as search engines (Google, Bing etc) then there's more money to be made by being the top generic search engine than the top specialized one (code search, papers search etc)


This is the real PVP of LLM for me. Compressing google search AND the internet into 8 GB and download is something unfathomable to me a two decades ago.

My hope now is that someone will figure out a way to separate intelligence from knowledge - i.e. train a model that knows how to interpret the wights of other models - so that training new intelligent models wouldn't require training them on a petabyte of data every run.


> has anyone ever attempted image -> html/css model?

I had a discussion with a friend about doing this, but for CNC code. The answer was that a model trained on a narrow data set underperforms one trained on a large data set and then fine tuned with the narrow one.


All of the multi-modal LLMs are reasonably good at this.


They did that in the chatgpt 4 demo 1.5 year ago. https://www.youtube.com/watch?v=GylMu1wF9hw


I was under the impression that you could more or less do something like that with the existing LLMs?

(May work poorly of course, and the sample I think I saw a year ago may well be cherry picked)


>For example, has anyone ever attempted image -> html/css model?

Have you tried upload the image to a LLM with vision capabilities like GPT-4o or Claude 3.5 Sonnet?


I tried and sonnet 3.5 can copy most of common UIs


> For example, has anyone ever attempted image -> html/css model?

There are already companies selling services where they generate entire frontend applications from vague natural language inputs.

https://vercel.com/blog/announcing-v0-generative-ui


Not sure why you think interpreting a hand drawing is "simpler" than parsing sequential text.


That's a thought I had. For example, could a model be trained to take a description, and create a Blender (or whatever other software) model from it? I have no idea how LLMs really work under the hood, so please tell me if this is nonsense.


I'm waiting exactly for this, gpt4 trips up a lot with blender currently (nonsensical order of operations etc.)


I think GPT5 will be the signal of whether or not we have hit a plateau. The space is still rapidly developing, and while large model gains are getting harder to pick apart, there have been enormous gains in the capabilities of light weight models.


> I think GPT5 will be the signal of whether or not we have hit a plateau.

I think GPT5 will tell if OpenAI hit a plateau.

Sam Altman has been quoted as claiming "GPT-3 had the intelligence of a toddler, GPT-4 was more similar to a smart high-schooler, and that the next generation will look to have PhD-level intelligence (in certain tasks)"

Notice the high degree of upselling based on vague claims of performance, and the fact that the jump from highschooler to PhD can very well be far less impressive than the jump from toddler to high schooler. In addition, notice the use of weasel words to frame expectations regarding "the next generation" to limit these gains to corner cases.

There's some degree of salesmanship in the way these models are presented, but even between the hyperboles you don't see claims of transformative changes.


>some degree of salesmanship

buddy every few weeks one of these bozos is telling us their product is literally going to eclipse humanity and we should all start fearing the inevitable great collapse.

It's like how no one owns a car anymore because of ai driving and I don't have to tell you about the great bank disaster of 2019, when we all had to accept that fiat currency is over.

You've got to be a particular kind of unfortunate to believe it when sam altman says literally anything.


PhD level-of-task-execution sounds like the LLM will debate whether the task is ethical instead of actually doing it


I wish I could frame this comment


lol! Producing academic papers for future training runs then.


Basically every single word out of Mr Worldcoin's mouth is a scam of some sort.


I’m waiting for the same signal. There are essentially 2 vastly different states of the world depending on whether GPT-5 is an incremental change vs a step change compared to GPT-4.


Which is why they'll keep calling the next few models GPT4.X


The next iteration depends on NVIDIA & co, what we need is sparse libs. Most of the weights in llms are 0, once we deal with those more efficiently we will get to the next iteration.


> Most of the weights in llms are 0,

that's interesting. Do you have a rough percentage of this?

Does this mean these connections have no influence at all on output?


My uneducated guess is that with many layers you can implement something akin to graph in brain by nulling lots of previous later outputs. I actually suspect that current models aren’t optimal with layers all of the same size but i know shit


This is quite intuitive. We know that a biological neural net is a graph data structure. And ML systems on GPUs are more like layers of bitmaps in Photoshop (it's a graphics processor). So if most of the layers are akin to transparent pixels, in order to build a graph by stacking, that's hyper memory inefficient.


What else can be done?

If you are sitting on 1 billions $ of GPU capex, what's $50 million in energy/training cost for another incremental run that may beat the leaderboard?

Over the last few years the market has placed its bets that this stuff will make gobs of money somehow. We're all not sure how. They're probably thinking -- it's likely that whoever has a few % is going to sweep and take most of this hypothetical value. What's another few million, especially if you already have the GPUs?

I think you're right -- we are towards the right end of the sigmoid. And with no "killer app" in sight. It is great for all of us that they have created all this value, because I don't think anyone will be able to capture it. They certainly haven't yet.


and even if there is another breakthrough all of these companies will implement it more or less simultaneously and they will remain in a dead heat


Presuming the breakthrough is openly shared. It remains surprising how transparent many of these companies are about new approaches that push the SoTa forward, and I suspect we're going to see a change. That companies won't reveal the secret sauce so readily.

e.g. Almost the entire market relies upon Attention Is All You Need paper detailing transformers, and it would be an entirely different market if Google had held that as a trade secret.


Given how absolutely pitiful the proprietary advancements in AI have been, I would posit we have little to worry about.


OTOH the companies who are sharing their breakthroughs openly aren't yet making any money, so something has to give. Their research is currently being bankrolled by investors who assume there will be returns eventually, and eventually can only be kicked down the road for so long.


Sort of yes, sort of no.

Of course, I agree that Stability AI made Stable Diffusion freely available and they're worth orders of magnitude less than OpenAI. To the point they're struggling to keep the lights on.

But it doesn't necessarily make that much difference whether you openly share the inner technical details. When you've got a motivated and well financed competitor, merely demonstrating a given feature is possible, showing the output and performance and price, might be enough.

If OpenAI adds a feature, who's to say Google and Facebook can't match it even though they can't access the code?


Well, that's because the potential reward from picking the right horse is MASSIVE and the cost of potentially missing out is lifelong regret. Investors are driven by FOMO more than anything else. They know most of these will be duds but one of these duds could turn out to be life changing. So they will keep bankrolling as long as they have the money.


Eventually can be (and has been) bankrolled by Nvidia. They did a lot of ground-floor research on GANs and training optimization, which only makes sense to release as public research. Similarly, Meta and Google are both well-incentivized to share their research through Pytorch and Tensorflow respectively.

I really am not expecting Apple or Microsoft to discover AGI and ferret it away for profitability purposes. Strictly speaking, I don't think superhuman intelligence even exists in the ___domain of text generation.


Anthropic has been very secretive about the supposed synthetic data they used to train 3.5 Sonnet.

Given how good the model is terms of the quality vs speed tradeoff, they must have something.


>Attention Is All You Need paper detailing transformers, and it would be an entirely different market if Google had held that as a trade secret.

I would guess that in that timeline, Google would never have been able to learn about the incredible capabilities of transformer models outside of translation, at least not until much later.


For some time, we have been at a plateau because everyone has caught up, which essentially means that everyone now has good training datasets and uses similar tweaks to the architecture. It seems that, besides new modalities, transformers might be a dead end as an architecture. Better scores on benchmarks result from better training data and fine-tuning. The so-called 'agents' and 'function calling' also boil down to training data and fine-tuning.


Benchmarks scores aren't good because they apply to previous generations of LLMs. That 2.23% uptick can actually represent a world of difference in subjective tests and definitely be worth the investment.

Progress is not slowing down but it gets harder to quantify.


There is different directions AI have lots to improve: multi modal which branch into robotics, single modal like image, video, and sound generation and understanding. Also would check back when openAI releases 5


We always needed a tock to see real advancement, like with the last model generation. The tick we had with the h100 was enough to bring these models to market but that's it.


For this model, it seems like the point is that it uses way less parameters than at least the large Llama model while having near identical performance. Given how large these models are getting, this is an important thing to do before making performance better again.


And with the increasing parameter size, the main winner will be Nvidia.

Frankly I just don't understand the economics of training a foundation model. I'd rather own an airline. At least I can get a few years out of the capital investment of a plane.


But billionaires already have that, they want a chance of getting their own god.


I think it’s impressive that they’re doing it on a single (large) node. Costs matter. Efficiency improvements like this will probably increase capabilities eventually.

I’m also optimistic about building better (rather than bigger) datasets to train on.


I don't think we can conclude that until someone trains a model that is significantly bigger than GPT-4.


This is already what the chinchilla paper surmised, it's no wonder that their prediction now comes to fruition. It is like an accelerated version of Moore's Law, because software development itself is more accelerated than hardware development.


> It seems increasingly apparent that we are reaching the limits of throwing more data at more GPUs;

I think you're just seeing the "make it work" stage of the combo "first make it work, then make it fast".

Time to market is critical, as you can attest by the fact you framed the situation as "on par with GPT-4o and Claude Opus". You're seeing huge investments because being the first to get a working model stands to benefit greatly. You can only assess models that exist, and for that you need to train them at a huge computational cost.


ChatGPT is like Google now. It is the default. Even if Claude becomes as good as ChatGPT or even slightly better it won't make me switch. It has to be like a lot better. Way better.

It feels like ChatGPT won the time to market war already.


But plenty people switched to Claude, esp. with Sonnet 3.5. Many of them in this very thread.

You may be right with the average person on the street, but I wonder how many have lost interest in LLM usage and cancelled their GPT plus sub.


-1: I know many people who are switching to Claude. And Google makes it near-zero friction to adopt Gemini with Gsuite. And more still are using the top-N of them.

This is similar to the early days of the search engine wars, the browser wars, and other categories where a user can easily adopt, switch between and use multiple. It's not like the cellphone OS/hardware war, PC war and database war where (most) users can only adopt one platform at a time and/or there's a heavy platform investment.


If ChatGPT fails to do a task you want, your instinct isn't "I'll run the prompt through Claude and see if it works" but "oh well, who needs LLMs?"


Please don't assume your experience applies to everyone. If ChatGPT can't do what I want, my first reaction is to ask Claude for the same thing. Often to find out that Claude performs much better. I've already cancelled ChaptGPT Plus for exactly that reason.


You just did that Internet thing where someone reads the reply someone wrote without the comment they are replying to, completely misunderstanding the conversation.


Eh, with the degradation of coding performance in ChatGPT I made the switch. Seems much better to work with on problems, and I have to do way less hand holding to get good results.

I'll switch again soon as something better is out.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: