> Many AI researchers are mathematicians. Any theoretical AI research paper will typically be filled with eye-wateringly dense math. AI dissolves into math the closer you inspect it. It's math all the way down.
There is a major caveat here. Most 'serious math' in AI papers is wrong and/or irrelevant!
It's even the case for famous papers. Each lemma in Kingma and Ba's ADAM optimization paper is wrong, the geometry in McInnes and Healy's UMAP paper is mostly gibberish, etc...
I think it's pretty clear that AI researchers (albeit surely with some exceptions) just don't know how to construct or evaluate a mathematical argument. Moreover the AI community (at large, again surely with individual exceptions) seems to just have pretty much no interest in promoting high intellectual standards.
“… we have a verbal agreement that these materials will not be used in model training”
Ha ha ha. Even written agreements are routinely violated as long as the potential upside > downside, and all you have is verbal agreement? And you didn’t disclose this?
At the time o3 was released I wrote “this is so impressive that it brings out the pessimist in me”[0], thinking perhaps they were routing API calls to human workers.
Now we see in reality I should’ve been more cynical, as they had access to the benchmark data but verbally agreed (wink wink) not to train on it.
LLM Code Assistants have succeeded at facilitating reusable code. The grail of OOP and many other paradigms.
We should not have an entire industry of 10,000,000 devs reinventing the JS/React/Spring/FastCGi wheel. Im sure those humans can contribute in much better ways to society and progress.
It does look like an exit. Employees were given the chance to cash in some of their shares at $86 billion valuation. Altman is getting shares.
New "investors" are Microsoft and Nvidia. Nvidia will get the money back as revenue and fuel the hype for other customers. Microsoft will probably pay in Azure credits.
If OpenAI does not make profit within two years, the "investment" will turn into a loan, which probably means bankruptcy. But at that stage all parties have already got what they wanted.
I recognize the author Jascha as an incredibly brilliant ML researcher, formerly at Google Brain and now at Anthropic.
Among his notable accomplishments, he and coauthors mathematically characterized the propagation of signals through deep neural networks via techniques from physics and statistics (mean field and free probability theory). Leading to arguably some of the most profound yet under-appreciated theoretical and experimental results in ML in the past decade. For example see “dynamical isometry” [1] and the evolution of those ideas which were instrumental in achieving convergence in very deep transformer models [2].
After reading this post and the examples given, in my eyes there is no question that this guy has an extraordinary intuition for optimization, spanning beyond the boundaries of ML and across the fabric of modern society.
We ought to recognize his technical background and raise this discussion above quibbles about semantics and definitions.
Let’s address the heart of his message, the very human and empathetic call to action that stands in the shadow of rapid technological progress:
> If you are a scientist looking for research ideas which are pro-social, and have the potential to create a whole new field, you should consider building formal (mathematical) bridges between results on overfitting in machine learning, and problems in economics, political science, management science, operations research, and elsewhere.
[1] Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
>> Looking at early telegraphs doesn’t predict the iPhone, etc.
The problem with this line of argument is that LLMs are not new technology, rather they are the latest evolution of statistical language modelling, a technology that we've had at least since Shannon's time [1]. We are way, way past the telegraph era, and well into the age of large telephony switches handling millions of calls a second.
Does that mean we've reached the end of the curve? Personally, I have no idea, but if you're going to argue we're at the beginning of things, that's just not right.
________________
[1] In "A Mathematical Theory of Communication", where he introduces what we today know as information theory, Shannon gives as an example of an application a process that generates a string of words in natural English according to the probability of the next letter in a word, or the next word in a sentence. See Section 3 "The Series of Approximations to English":
Enough billions of dollars have been spent on LLMs that a reasonably good picture of what they can and can't do has emerged. They're really good at some things, terrible at others, and prone to doing something totally wrong some fraction of the time. That last limits their usefulness. They can't safely be in charge of anything important.
If someone doesn't soon figure out how to get a confidence metric out of an LLM, we're headed for another "AI Winter".
Although at a much higher level than last time. It will still be a billion dollar industry, but not a trillion dollar one.
At some point, the market for LLM-generated blithering should be saturated. Somebody has to read the stuff. Although you can task another system to summarize and rank it. How much of "AI" is generating content to be read by Google's search engine? This may be a bigger energy drain than Bitcoin mining.
I can easily imagine people X decades from now discussing this stuff a bit like how we now view teeth-whitening radium toothpaste and putting asbestos in everything, or perhaps more like the abuse of Social Security numbers as authentication and redlining.
Not in any weirdly-self-aggrandizing "our tech is so powerful that robots will take over" sense, just the depressingly regular one of "lots of people getting hurt by a short-term profitable product/process which was actually quite flawed."
P.S.: For example, imagine having applications for jobs and loans rejected because all the companies' internal LLM tooling is secretly racist against subtle grammar-traces in your writing or social-media profile. [0]
> Imagine if we recruited professors not just for their academic credentials but for their real-world achievements.
The mistake is to think that someone's world is more "real" than their neighbor's. That may be arguably true if we talk about farmers or fishermen, but it's much less clear that an entrepreneur's world is more "real" than a university professor's.
Even System/370 [0] architecture (mainframe) is a great place to start. There's a one-to-one correspondence between assembler instructions and machine instructions, which makes writing and debugging considerably easier. It's actually an incredibly robust processor architecture that's simple to understand.
[0] IBM System/370 Principles of Operation (the "big yellow book")
I don't think any of it was based on *Lisp. Graphics mostly developed independently. And as such, we use different words and different terminology sometimes, like saying "scalar ISA" when we talk about designing ISAs that don't mandate cross-lane interaction in one thread. Sorry!
As far as I know, the first paper covering using SIMD for graphics was Pixar's "Channel Processor", or Chap [0], in 1984. This later became one of the core implementation details of their REYES algorithm [1]. By 1989, they had their own RenderMan Shading Language [2], an improved version of Chap, and you can see the similarities from just the snippet at the start of the code. This is where Microsoft took major inspiration from when designing HLSL, and which NVIDIA then started to extend with their own Cg compiler. 3dlabs then copy/pasted this for GLSL.
The best review of the relevant research that I know is Pottier’s presentation[1]. It’s from 2007, but then again as far as fundamental concepts it doesn’t seem to me that Rust is state-of-the-art even as of then. (To be fair, that’s not due to ignorance, Rust’s type system is deliberately conservative.)
The DCT is really neat, but the actual compression magic comes from a combination of side effects that occur after you apply it:
1. The DCT (II) packs lower frequency coefficients into the top-left corner of the block.
2. Quantization helps to zero out many higher frequency coefficients (toward bottom-right corner). This is where your information loss occurs.
3. Clever zig-zag scanning of the quantized coefficients means that you wind up with long runs of zeroes.
4. Zig-zag scanned blocks are RLE coded. This is the first form of actual compression.
5. RLE coded blocks are sent through huffman or arithmetic coding. This is the final form of actual compression (for intra-frame-only/JPEG considerations). Additional compression occurs in MPEG, et. al. with interframe techniques.
It's written in Rust and is based around a version of Bochs modified for deterministic execution. It's got time-travel debugging (with WinDbg), which works by replaying forward from the nearest snapshot to the point at which the user is asking to move backwards to.
I think a major source of the problem is academia. I’m an external examiner for CS students in Denmark, and they are basically still taught the OOP and onion architecture way of building abstractions up front. Which is basically one of the worst mantras in software development. What is even worse is that they are taught these things to a religious degree.
What is weird to me is that there is has been a lot of good progression in how professionals write software over the years. As you state, abstractions aren’t inherently bad for everything. I can’t imagine not having some sort of base class containing “updated”, “updated_by” and so on for classic data which ends up in a SQL db. But in general I’ll almost never write an abstraction unless I’m absolutely forced to do so. Yet in academia they are still teaching the exact same curriculum that I was taught 25 years ago.
It’s so weird to sit there and grade their ability to build these wild abstractions in their fancy UML and then implement them in code. Knowing that like 90% of them are never going to see a single UML diagram ever again. At least if they work in my little area of the world. It is what it is though.
That is what parent is challenging. You can of course disagree. I think it's an interesting point. How much damage do we do to ourselves by societally selecting charismatic people who speak eloquently as leaders (importantly: over other qualities)?
I think there’s one flaw in the overall theory presented in the article: it was demand for coal for heating independent of the industrial revolution that really kickstarted things and the labor exploited for coal mining was the lowest of the socioeconomic classes. Before the industrial revolution England was already mining around five times more coal than the rest of the world combined just to survive winters. Once they exhausted the easy surface deposits they had to go deeper and deeper which required mechanized power to work against the water seeping in. The first engines were invented not when labor became too expensive but when it was impossible to do with human labor at all.
After reading Coal - A Human History I’m of the opinion that the industrial revolution was a complete accident of circumstance on a tiny island that didn’t have enough trees to support its population’s energy needs and a surface supply of coal just big enough to get the industry started but not enough to supply the growing population without digging deeper.
The idea that any language lacks recursion is firstly very poorly supported — the only person who actually thinks this is true and demonstrated by any real language is Daniel Everett who thinks Pirahã lacks recursion, but there's a wealth of argumentation over whether this is true or not (for starters see Nevins, Pesetsky, and Rodrigues's "Piraha Exceptionality: a Reassessment" for a good discussion on this). Secondly, it's not crucial to any contemporary, wide-spread notions of Universal Grammar that languages have recursion in their structure — the theory of Syntactic Structures does not suppose, or even depend upon, expressions having recursive tree structures, it merely hypothesizes that the grammatical processes of a language can only employ structural relationships (and syntactic types) to determine when they can or cannot apply, but this idea has long since fallen into the dustbin of history as ever more non-structural phenomena (features, binding, etc.) became relevant to the grammatical theory.
Second, there has yet to be a demonstration from any known language that the language actually does not have an underlying tree structure to it — even the most free word-order languages such as Warlpiri display evidence of tree structure. NLP people may have a major bone to pick with core Chomskyan theory because of the incredibly difficult in parsing using it, but Chomskyan linguistics isn't about designing computationally efficient means of parsing sentences, it's about developing a good scientific theory about the nature of the constraints on human grammars. And I say core Chomskyan theory above because NLP people actually love various less core Chomskyan theory (Head-driven Phrase Structure Grammar, for instance, which has plenty of work done on it in the parsing world; or Combinatory Categorial Grammar which has been looked at quite a lot in the computational literature). The fact that NLP people prefer other methods such as dependency grammars doesn't say anything about the theoretical framework Chomsky introduced any more than their preference for numerical methods of differentiation and integration in mathematics bears on the theoretical importance or validity of precise symbolic differentiation and integration. Or to rephrase that more concisely — just because your computer isn't powerful enough to calculate these equations doesn't make the equations wrong.
I don't know whether or not Chomsky should be #1 on this list, but I do know that you have no clue what you're talking about with regards to contemporary linguistic theory.
He did too. Everett tries to make the point that because Piraha doesn't use recursion (which we only know by his own account, and nobody else's, since nobody else knows Piraha as well as he and the Piraha do, also by his account) then Chomsky must be wrong.
How is Chomsky wrong if Piraha doesn't have recursion? According to Everett, always, Chomsky's position that recursion is the distinguishing characteristic of human language, must necessarily mean that all human languages have recursion.
This is exactly like saying that, because it only snows in winter, it's not winter if it's not snowing.
For Chomsky to be right, it suffices for a single human language to display recursion. If even one human language displays recursion, then humans in general can learn any language that displays recursion- because we know well that human infants learn the language of the linguistic communities they're reared in (and therefore any human can learn any human language). For instance, even a Piraha baby raised in a Brazilian family would learn to speak Portuguese, and Portuguese has recursive embedding.
Everett of course claims that Piraha, somehow magically unlike any other human being in the world, are incapable of learning any other language than Piraha. He also claims that they were unable to learn simple arithmetic, beyond 1 + 1 = 2, despite his, um, best efforts.
In fact, all human languages except Piraha, and only by Everett's account, display recursion. Which makes Everett's claim about Piraha so hard to accept. The fact that he remains the only (self-professed) authority on Piraha makes it even harder to take him seriously.
Generally, it's not so much that Chomsky has won anything here. Everett is so clearly a total troll, and his books the printed equivalent of clickbait, that it's ridiculous to even claim there is anything like a debate to be had. It's like "debating" a climate denialist.
I feel like there's some semantic slippage around the meaning of the word "accuracy" here.
I grant you, my print Encyclopedia Britannica is not 100% accurate. But the difference between it and a LLM is not just a matter of degree: there's a "chain of custody" to information that just isn't there with a LLM.
Philosophers have a working definition of knowledge as being (at least†) "justified true belief."
Even if a LLM is right most of the time and yields "true belief", it's not justified belief and therefore cannot yield knowledge at all.
Knowledge is Google's raison d'etre and they have no business using it unless they can solve or work around this problem.
† Yes, I know about the Gettier problem, but is not relevant to the point I'm making here.
Most of the issue here comes from the mistaken belief that computers do mathematical (and more specifically arithmatic) operations as first class citizens.
It’s utterly mundane, completely common knowledge for those who have been around long enough to have watched both the start and end on Loopt that Altman is a dark-triad sociopath with one marketable skill: (in pg’s words) “becoming powerful”.
Guy can’t code, can’t design, can’t publish, can’t climb a traditional corporate ladder with even modest guardrails against fraud, can’t keep his hand out of the cookie jar. Can lie, threaten, cajole, manipulate, bribe with zero hesitation or remorse.
I’ve been short this nonsense for a decade, and it’s done no favors to me on remaining solvent, but when the market gets rational, it usually gets rational all at once.
Karpathy, Ilya, Yoon right off the top of my head, countless others. LeCun woke up the other day and chose violence on X. Insiders are getting short like Goldman dealing with Burry.
Guy has nine lives, already been fired for fraud like three times and he’s still living that crime life, so who knows, maybe he lasts long enough to put Ice Nine in my glass after all, but this can only happen so many times.
That OpenAI are institutionally unethical. That such a young company can be become rotten so quickly can only be due to leadership instruction or leadership failure.
Top 6 science guys are long gone. Open AI is run by marketing, business, software and productization people.
When the next wave of new deep learning innovations sweeps the world, Microsoft eats whats left of them. They make lots of money, but don't have future unless they replace what they lost.
Actually I take it all back, the car is a really good model for how we should handle AI safety.
With cars, we let most people use some very dangerous but also very useful tools. Our approach, as a society, to making those tools safe is multi-layered. We require driver's ed and license drivers to make sure they know how to be safe. We register cars as a tool to trace ownership. We have rules of the road that apply to drivers. We have safety rules that apply to manufacturers (and limit what they are allowed to let those tools do). If a user continues to break the rules, we revoke their license. If the manufacturer breaks the rules, we make them do a recall.
I actually agree with you 100%, this is probably a good way to think about regulating AI. Some rules apply to individual users. Some rules apply to the makers of the tools. We can come together as a society and determine where we want those lines to be. Let's do it.
Individual labs somehow manage to do that and we're all grateful. Martin Steinegger's lab put out ColabFold, RELION is the gold standard for cryo-EM despite being academic software and the development of more recent industry competitors like cryoSPARC. Everything out of the IPD is free for academic use. Someone has to fight like hell to get all those grants, though, and from a societal perspective, it's basically needlessly redundant work.
My frustrations aren't with a lack of open source models, some poor souls make them. My disagreement is with the perception that academia has insufficient incentive to work on socially important problems. Most such problems are ONLY worked on in academia until they near the finish line. Look at Omar Yaghi's lab's work on COFs and MOFs for carbon/emission sequestration and atmospheric water harvesting. Look at all the thankless work numerous labs did on CRISPR-Cas9 before the Broad Institute even touched it. Look at Jinbo Xu's work, on David Baker's lab's and the IPD's work, etc. Look at what labs first solved critical amyloid structures, infuriatingly recently, considering the massive negative social impacts of neurodegenerative diseases.
It's only rational for companies that only care about their own profit maximization to socialize R&D costs and privatize any possible gains. This can work if companies aren't being run by absolute ghouls who aren't delaying the release of a new generation of drugs to minimize patent duration overlap or who aren't trying to push things that don't work for short-term profit. This can also work if we properly fund and credit publicly funded academic labs. This is not what's happening, however, instead public funded research is increasingly demeaned, defunded, and dismantled due to the false impression that nothing socially valuable gets done without a profit motive. It's okay, though, I guess under this kind of LSC worldview, that everything always corrects itself so preempting problems doesn't matter, we'll finally learn how much actual innovation is publicly funded when we get the Minions movie, aducanumab, and WeWork over and over again for a few decades while strangling the last bit of nature we have left.
Late-stage capitalism didn't bring us AlphaFold, scientists did, late-stage capitalism just brought us Alphabet swooping in at literally the last minute. Socialize the innovation because that requires potential losses, privatize the profits, basically. It's reminiscent of "Heroes of CRISPR," where Doudna and Charpentier are supposedly just some middle-men, because stepping in at the last minute with more funding is really what fuels innovation.
AlphaFold wasn't some lone genius breakthrough that came out of nowhere, everything but the final steps were basically created in academia through public funding. The key insights, some combination of realizing that the importance of sequence to structure to function put analyzable constraints on sequence conservation and which ML models could be applied to this, were made in academia a long time ago. AlphaFold's training set, the PDB, is also a result of decades of publicly funded work. After that, the problem was just getting enough funding amidst funding cuts and inflation to optimize. David Baker at IPD did so relatively successfully, Jinbo Xu is less of a fundraiser but was able to keep up basically alone with one or two grad students at a time, etc. AlphaFold1 threw way more people and money to basically copy what Jinbo Xu had already done and barely beat him at that year's CASP. Academics were leading the way until very, very recently, it's not like the problem was stalled for decades.
Thankfully, the funding cuts will continue until research improves, and after decades of inflation cutting into grants, we are being rewarded by funding cuts to almost every major funding body this year. I pledge allegiance to the flag!
EDIT: Basically, if you know any scientists, you know the vast majority of us work for years with little consideration for profit because we care about the science and its social impact. It's grating for the community, after being treated worse every year, to then see all the final credit go to people or companies like Eric Lander and Google. Then everyone has to start over, pick some new niche that everyone thinks is impossible, only to worry about losing it when someone begins to get it to work.
I'm convinced it's the hiring that has shaped the scene.
If you're hiring, you want to be able to add/replace people as easily as possible. If you're being hired, you want to charge as much as you can.
And to satisfy those two demands, the current web stack is great. Almost like it's been built for it.
It's got very little to do with the tech itself, a lot more to do with market dynamics. That's the problem it's trying to solve.
There is a major caveat here. Most 'serious math' in AI papers is wrong and/or irrelevant!
It's even the case for famous papers. Each lemma in Kingma and Ba's ADAM optimization paper is wrong, the geometry in McInnes and Healy's UMAP paper is mostly gibberish, etc...
I think it's pretty clear that AI researchers (albeit surely with some exceptions) just don't know how to construct or evaluate a mathematical argument. Moreover the AI community (at large, again surely with individual exceptions) seems to just have pretty much no interest in promoting high intellectual standards.