A good literary production. I would have been proud of it had I thought of it, but it's a path to observe a strong "whataboutery" element that if we use "stochastic parrot" as shorthand and you dislike the term, now you understand why we dislike the constant use of "infer", "reason" and "hallucinate"
Parrots are self aware, complex reasoning brains which can solve problems in geometry, tell lies, and act socially or asocially. They also have complex vocal chords and can perform mimicry. Very few aspects of a parrots behaviour are stochastic but that also underplays how complex stochastic systems can be in their production. If we label LLM products as Stochastic Parrots it does not mean they like cuttlefish bones or are demonstrably modelled by Markov chains like Mark V Shaney.
Well parrots can make more parrots, LLMs can't make their own GPUs. So parrots win, but LLMs can interpolate and even extrapolate a little, have you ever heard a parrot do translation, hearing you say something in English and translating it to Spanish? Yes, LLMs are not parrots. Besides their debatable abilities, they work with human in the loop, which means humans push them outside their original distribution. That's not a parroting act, being able to do more than pattern matching and reproduction.
I don't like wading into this debate when semantics are very personal/subjective. But to me, it seems like almost a sleight of hand to add the stochastic part, when actually they're possibly weighted more on the parrot part. Parrots are much more concrete, whereas the term LLM could refer to the general architecture.
The question to me seems: If we expand on this architecture (in some direction, compute, size etc.), will we get something much more powerful? Whereas if you give nature more time to iterate on the parrot, you'd probably still end up with a parrot.
There's a giant impedance mismatch here (time scaling being one). Unless people want to think of parrots being a subset of all animals, and so 'stochastic animal' is what they mean. But then it's really the difference of 'stochastic human' and 'human'. And I don't think people really want to face that particular distinction.
"Expand the architecture" .. "get something much more powerful" .. "more dilithium crystals, captain"
Like I said elsewhere in this overall thread, we've been here before. Yes, you do see improvements in larger datasets, weighted models over more inputs. I suggest, I guess I believe (to be more honest) that no amount of "bigger" here will magically produce AGI simply because of the scale effect.
There is no theory behind "more" and that means there is no constructed sense of why, and the absence of abstract inductive reasoning continues to say to me, this stuff isn't making a qualitative leap into emergent anything.
It's just better at being an LLM. Even "show your working " is pointing to complex causal chains, not actual inductive reasoning as I see it.
And that's actually a really honest answer. Whereas someone of the opposite opinion might be like parroting in the general copying-template sense actually generalizes to all observable behaviours because templating systems can be turing-complete or something like that. It's templates-all-the-way-down, including complex induction as long as there is a meta-template to match on its symptoms it can be chained on.
Induction is a hard problem, but humans can skip infinite compute time (I don't think we have any reason to believe humans have infinite compute) and still give valid answers. Because there's some (meta)-structure to be exploited.
Architecturally if machines / NN can exploit this same structure is a truer question.
> this stuff isn't making a qualitative leap into emergent anything.
The magical missing ingredient here is search. AlphaZero used search to surpass humans, and the whole Alpha family from DeepMind is surprisingly strong, but narrowly targeted. The AlphaProof model uses LLMs and LEAN to solve hard math problems. The same problem solving CoT data is being used by current reasoning models and they have much better results. The missing piece was search.
I'm sure both of you know this, but "stochastic parrot" refers to the title of a research article that contained a particular argument about LLM limitations that had very little to do with parrots.
Firstly this is meta ad hom. You're ignoring the argument to target the speaker(s)
Secondly, you're ignoring the fact that the community of voices with experience in data sciences, computer science and artificial intelligence themselves are split on the qualities or lack of them in current AI. GPT and LLM are very interesting but say little or nothing to me of new theory of mind, or display inductive logic and reasoning, or even meet the bar for a philosophers cave solution to problems. We've been here before so many, many times. "Just a bit more power captain" was very strong in connectionist theories of mind. fMRI brains activity analytics, you name it.
So yes. There are a lot of "us" who are pushing back on the hype, and no we're not a mini cult.
> GPT and LLM are very interesting but say little or nothing to me of new theory of mind, or display inductive logic and reasoning, or even meet the bar for a philosophers cave solution to problems.
The simple fact they can generate language so well makes me think... maybe language itself carries more weight than we originally thought. LLMs can get to this point without personal experience and embodiment, it should not have been possible, but here we are.
I think philosophers are lagging science now. The RL paradigm of agent-environment-reward based learning seems to me a better one than what we have in philiosophy now. And if you look at how LLMs model language as high dimensional embedding spaces .. this could solve many intractable philosophical problems, like the infinite homunculus regress problem. Relational representations straddle the midpoint between 1st and 3rd person, offering a possible path over the hard problem "gap".
There are a couple Twitter personalities that definitely fit this description.
There is also a much bigger group of people that haven't really tried anything beyond GPT-3.5, which was the best you could get without paying a monthly subscription for a long time. One of the biggest reasons for r1 hype, besides the geopolitical angle, was people could actually try a reasoning model for free for the first time.
ie, the people that AI is dumb? Or you are saying I'm in a cult for being pro it - I'm definitely part of that cult - the "we already have agi and you have to contort yourself into a pretzel to believe otherwise" cult. Not sure if there is a leader though.