I think the fact that all (good) LLM datasets are full with licensed/pirated material means we'll never really see a decent open source model under the strict definition. Open weight + open source code is really the best we're going to get, so I'm fine with it coopting the term open source even if it doesn't fully apply.
> we'll never really see a decent open source model under the strict definition
But there are already a bunch of models like that, were everything (architecture, training data, training scripts, etc) is open, public and transparent. Since you weren't aware those existed since before, but you now know that, are you willing to change your perspective on it?
> so I'm fine with it coopting the term open source even if it doesn't fully apply
It really sucks that the community seems OK with this. I probably wouldn't have been a developer without FOSS, and I don't understand how it can seem OK to rob other people of this opportunity to learn from FOSS projects.
Not all of the community is OK with this, lots of folks are strongly against OSI's bullshit OSAID for example. Really it should have been more like the Debian Deep Learning Team's Machine Learning Policy, just like last time when the OSI used the Debian Free Software Guidelines (DFSG) to create the Open Source Definition (OSD).
The reason is that the usage is completely different from coroutine based async. With GPUs you want to queue _as many async operations as possible_ and only then synchronize. That is, you would have a program like this (pseudocode):
b = foo(a)
c = bar(b)
d = baz(c)
synchronize()
With coroutines/async await, something like this
b = await foo(a)
c = await bar(b)
d = await baz(c)
would synchronize after every step, being much more inefficient.
It really depends on if you're dealing with an async stream or a single async result as the input to the next function. If a is an access token needed to access resource b, you cannot access a and b at the same time. You have to serialize your operations.
Well you can and should create multiple coroutine/tasks and then gather them. If you replace cuda with network calls, it’s exactly the same problem. Nothing to do with asyncio.
No, that's a different scenario. In the one I gave there's explicitly a dependency between requests. If you use gather, the network requests would be executed in parallel. If you have dependencies they're sequential by nature because later ones depend on values of former ones.
The 'trick' for CUDA is that you declare all this using buffers as inputs/outputs rather than values and that there's automatic ordering enforcement through CUDA's stream mechanism. Marrying that with the coroutine mechanism just doesn't really make sense.
If you compare with e.g. Deepseek and other hosters, you'll find that OpenAI is actually almost certainly charging very high margins (Deepseek has an 80% profit margin and they're 10x cheaper than openai).
The training/R&D might make OpenAI burn VC cash, but this isn't comparable with companies like WeWork whose products actively burn cash
On their subscriptions, specifically the pro subscription, because it's a flatrate to their most expensive model. The API prices are all much more expensive. It's unclear whether they're losing money on the normal subscriptions, but if so, probably not by much. Though it's definitely closer to what you described, subsidizing it to gain 'mindshare' or whatever.
Well I think there's many cheaper models in terms of bang for buck currently per token and intelligence than gpt4o. Other than OpenAI having very high rate limits and throughout available without a contract done with sales, I don't see much reason to use it currently instead of sonnet 3.5 or 3.7, or Google's Flash 2.0
Perhaps their training cost and their current inference cost is higher, but what you get as a customer is a more expensive product for what it is, IMO.
they for sure lose money on some months for some customers, but I expect globally most of subscriptions (including mine that I recently cancelled) would be much better of to migrate to API
everyone that o know that have/had subscription didn't used it very extensively, and that is how it's still profitable in general
I suspect that it's the same for copilot, especially the business variant, while they definitely lose money on my account, believe that when looking on our whole company subscription I wouldn't be surprised that it's even 30% of what we pay
Do you take issue with the 'purely empirical' approach (just trying out variants and seeing which sticks) or only with its insufficient documentation?
I don't know how you'd improve on the former. For a lot of it there simply isn't any sound theoretical foundation, so you just end up with flimsy post-hoc rationalizations.
While I agree that it's unfortunate that people often just present magic numbers without explaining where they come from, in my experience providing documentation for how one arrives at these often enough gets punished because it draws more attention to them. That is, reviewers will e.g. complain about preliminary experiments, asking for theoretical analysis or question why only certain variants were tried, whereas magic numbers are just kind of accepted.
Seems pretty clear they aren't objecting to throwing stuff at the wall and seeing what sticks, but with calling the outcome of sticky-wall work "science".
I'd say that's a bit strict take on science, one could be generous and compare it to biologist going out into the forest and combing bsck with a report on finding a new lichen.
Thought admittedly these days the biologist is probably expected to report details about their search strategy, which the sticky-wall researchers don't.
The biologist would be expected to describe the lichen in detail, including where it was found, its expected ecology, its place in the ecosystem, life-cycle, structure, etc. It is no longer 1696 where we can go spear some hapless fish, bring back its desiccated body, and let our fellow gentleman ogle over its weirdness.
I'm not GP, but I don't think they are taking issue with the fact that e.g. layer numbers or architecture were arrived at without first-principles but rather empirically.
Rather that when you do come to something empirically, you need to validate your findings by e.g. ablations, hypothesis testing, case studies, etc...
> I don't know how you'd improve on the former. For a lot of it there simply isn't any sound theoretical foundation, so you just end up with flimsy post-hoc rationalizations.
So great science would come up with a sound theoretical foundation, or at least strong arguments as to why no such foundation can exist.
There are even much cheaper services that host it for only slightly more than deepseek itself [1]. I'm now very certain that deepseek is not offering the API at a loss, so either OpenAI has absurd margins or their model is much more expensive.
[1] the cheapest I've found, which also happens to run in the EU, is https://studio.nebius.ai/ at $0.8/million input.
Edit: I just saw that openrouter also now has nebius
Yes, sorry, I was being maximally-broad in my comment but I would think it's very, very, very likely that OpenAI is currently charging huge margins and markups to help maintain the cachet / exclusivity / and, in some senses, safety of their service. Charging more money for access to their models feels like a pretty big part of their moat.
Also possibly b/c of their sweetheart deal with Azure they've never needed to negotiate enterprise pricing so they're probably calculating margins based on GPU list prices or something insane like that.
The "97.3%" match is probably just the confidence value - I don't think a frequentist interpretation makes sense for this. I'm not an expert in face recognition, but these systems are very accurate, typically like >99.5% accuracy with most of the errors coming from recall rather than precision. They're also not _that_ expensive. Real-time detection on embedded devices has been possible for around a decade and costs for high quality detection have come down a lot in recent years.
Still, you're right that at those scales these systems will invariably slip once in a while and it's scary to think that this might enough to be considered a criminal, especially because people often treat these systems as infallible.
Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.
Really sad to see. Emad pivoting to posting crypto nonsense on twitter made me think the writing is on the wall for Stability, but I still didn't expect it so soon. I expect they'll pivot to closed models and then fade away.
> Really sad to see. Emad pivoting to posting crypto nonsense on twitter made me think the writing is on the wall for Stability
Open-source AI is a race to zero that makes little money and Stability was facing lawsuits (especially with Getty) which are mounting into the millions and the company was already burning tens of millions.
Despite being the actual "Open AI", Stability cannot afford to sustain itself doing so.
Wow much of what he’s saying in that video is a lie with regards to the history of latent diffusion, creating an open source GPT3, etc. Just taking credit for a bunch of work he didn’t have much to do with.
You know it never ceases to amaze me how even the most respected fall prey to this money laundering scheme. If people even spent some time to read about Tether they would not touch this stuff. It's blood money.
You should probably post some citations to that. Tether most probably have backing for every single dollar they have, and then plenty more. What about it is “blood money”?
I mean, is AI a less sketchy space in 2024 than crypto/blockchain in 2024? Two or three years ago sure, I guess, but today?
The drama around OpenAI is well documented, there are multiple lawsuits and an SEC investigation at least in embryo, Karpathy bounced and Ilya's harder to spot than Kate Middleton (edit: please see below edit in regards to this tasteless quip). NVIDIA is pushing the Dutch East India Company by some measures of profitability with AMD's full cooperation: George Hotz doesn't knuckle under to the man easily and he's thrown in the towel on ever getting usable drivers on "gaming"-class gear. At least now I guess the Su-Huang Thanksgiving dinners will be less awkward.
Of the now over a dozen FAANG/AI "boomerangs" I know, all of them predate COVID hiring or whatever and all get crammed down on RSU grants they accumulated over years: whether or not phone calls got made it's pretty clearly on everyone's agenda to wash all the ESOP out, neutron bomb the Peninsula, and then hire everyone back with at dramatically lower TC all while blowing EPS out quarter after quarter.
Meanwhile the FOMC is openly talking about looser labor markets via open market operations (that's direct government interference in free labor markets to suppress wages for the pro-capitalism folks, think a little about what capitalism is supposed to mean if you are ok with this), and this against the backdrop of an election between two men having trouble campaigning effectively because one is fighting off dozens of lawsuits including multiple felony charges and the other is flying back and forth between Kiev and Tel Aviv trying to manage two wars he can't seem to manage: IIRC Biden is in Ukraine right now trying to keep Zelenskyy from drone-bombing any more refineries of Urals crude because `CL` or whatever is up like 5% in the last three weeks which is really bad in an election year looking to get nothing but uglier: is anyone really arguing that some meme on /r/crypto is what's pushing e.g. BTC and not a pretty shaky-looking Fed?
Meanwhile over in crypto land, over the same period of time that AI and other marquee Valley tech has been turning into a scandal-plagued orgy of ugly headlines on a nearly daily basis, the regulators have actually been getting serious about sending bad actors to jail or leaning on them with the prospect (SBF, CZ), major ETFs and futures serviced by reputable exchanges (e.g. CME) have entered mainstream portfolios, and a new generation of exchanges (`dy/dx`, Vertex, Apex, Orderly) backed by conventional finance investments in robust bridge infrastructure (LayerZero) are now doing standard Island/ARCA-style efficient matching and then using the blockchain for what it's for: printing a consolidated, Reg NMS/NBBO/SIP-style consolidated tape.
As a freelancer I don't really have a dog in this fight, I judge projects by feasibility, compensation, and minimum ick factor. From the vantage point of my flow the AI projects are sketchier looking on average and below market bids on average contrasted to the blockchain projects, a stark reversal from even six months ago.
Edit: I just saw the news about Kate Middleton, I was unaware of this when I wrote the above which is in extremely poor taste in light of that news. My thoughts and prayers are with her and her family.
I and many many people around me use ChatGPT every single day in our lives. AI has a lot of hype but it’s backed by real crap that’s useful. Crypto on the other hand never really did anything practical except help people buy drugs and launder money. Or make people money in the name of investments.
reply