I feel like lost in this conversation is that ChatGPT is incredibly good at writing English. It basically never makes grammatical mistakes, it doesn't spew gibberish, and for the most part has extremely well-structured replies. The replies might be bullshit or hallucinations, but it's not gibberish.
It's kind of breathtaking that we forgot about that being hard.
The goalposts are moving again.
BTW, it has passed many standardized tests under the same circumstances as a human.
Some of the replies are gibberish, especially once you get into technical subjects that it has very little training data on. It kitbashes words together that actually mean nothing, which is no surprise given that it's an LLM.
> BTW, it has passed many standardized tests under the same circumstances as a human.
No, it hasn’t, and it is physically impossible for it to. The extent to which the differences are material may be debatable, but this claim is simply false.
It would be a useful contribution to explain what you think the material differences are, rather than referencing them through innuendo, as if anyone knows what you mean.
GPT-4 is absolutely more generally knowledgeable than any individual person. Individual humans can still easily beat it when it comes to knowledge of individual subjects.
Let’s not conflate knowledge with intelligence though. GPT-4 simply isn’t intelligent.
Would be curious to hear an elaboration on this perspective. In your opinion, on which measures of intelligence would GPT-4 fail to out-perform a human with an IQ of 80? Conversely, on which measures do you imagine it would succeed at doing so? Are the latter less significant or valid than the former?
Conscious thought. In biological terms it has a superhuman cerebellum but no cerebral cortex at all. It can't assess what it's doing.
GPT4 will produce stuff, but only if prodded to do so by a human.
I recently asked it to help me write some code for a Garmin smartwatch. The language used for this is MonkeyC, of which there isn't a huge amount of examples on the internet.
It confidently provided me with code, but it was terrible. There were gaps with comments suggesting what it should do, bugs, function calls that didn't exist, and many other problems.
I pointed out the issues and GPT4 kept apologising and trying new stuff, but without any improvement. There wasn't any intelligence there; the model had just intuited what a program might look like from sparse data, and then kept doing the same thing. It didn't know what it was doing; it just took directions from me. It couldn't suggest ideas when it couldn't map to a concept in memory.
A human with an IQ of 80 would know if they didn't know how to code in MonkeyC. If they thought they did, they'd soon adjust their behaviour when they realised they couldn't. They'd know where the limit of their knowledge was. They wouldn't keep trying to guess what functions were available. If they didn't have any examples in memory of what the functions might be like, they might come up with novel workarounds, or they'd appreciate what program I was trying to write and suggest a different approach.
Presumably we'll make progress on this at some point, but I think it'll take new breakthroughs, not just throwing more parameters at existing models.
Exactly my experiences. With a fucking NGINX configuration, for which I provided it the documentation, and the URL rewrite lines it would require. I spent days on trying to find the value that other people are claiming it has.
Same. Those videos of people letting ChatGPT have almost certainly edited out the hours they spent trying to force the thing to spit out usable code. ChatGPT simply doesn't have enough context, nor the ability to "remember" context to do anything larger than a single function or two.
What makes it even more frustrating is to iterate, you constantly have to keep it updated with any changes you made outside of chatgpt.
Don't get me wrong, it's pretty useful but it is far from a silver bullet. Getting that last 20% (or even 30%) is going to be a lot of work...
They have a specific device to do that now. I have tried to say "write a random sentence with 6 words and 2 numbers" and it completely fails, but it can do the straightforward "write a random [x] of length [y]."
Yup. I think this is the best point of comparison - a 4-6 year old kid. Specifically, one that hasn't gone to school yet. The difference between a typical 6-year old and a typical adult is in big part that the latter spent 10+ years being systematically fine-tuned.
Logic, arithmetics, algebra, precisely following steps of an algorithm - those are not skills one "kinda" just "gets" at some point, they're trained by deliberate practice, by solving lots and lots of problems specifically constructed to exercise those skills.
Point being, get GPT-4 through school, and then compare with adult performance on math-adjacent tasks. Or at least give it a chance by prompting it to solve it step-by-step as a problem, so it can search closer to the slice of latent space that encodes for relevant examples of similar problems and methods of solving them.
I started seriously using computers at 2.5, and I started writing and recording songs with a tape recorder at 3, won a local award for one song, and playing chess at 4. I know plenty of people with similar experiences. If you nurture kids and don't treat them like they're stupid, they can do some quite impressive things.
Anecdote: admittedly, I'm autistic as are the people I know, so maybe that's not a good sample. I struggle with a lot of basic shit even as an adult. Oh god, I empathize with the hypothetical GPT5.
It would be very helpful to define intelligence before asserting that a thing does not have it. A cursory look at the Wikipedia page for the definition of intelligence shows there is no one, agreed-upon definition. In fact some believe that “intelligence” simply means pointing to ourselves.
> Individual humans can still easily beat it when it comes to knowledge of individual subjects.
What does a phrase like "GPT-4 scores 90th percentile on the Uniform Bar Exam" mean to you, regarding whether humans can easily surpass its knowledge and reasoning?
> What does a phrase like "GPT-4 scores 90th percentile on the Uniform Bar Exam" mean to you, regarding whether humans can easily surpass its knowledge and reasoning?
Absolutely nothing, because of construct validity. Those tests measure things that have shown to correlate with abilities of concern in humans, and so are, for their purposes, valid for humans.
This hasn’t been demonstrated for LLMs, and the assumption that construct validity can be assumed without being established is begging the question: it is presuming not only that LLMs are general intelligences, but thaf they are general intelligences structurally similar to human intelligences such that the proxy measures for cognitive capacities work similarly.
I suppose, when GPT-4 writes correctly working code that does what you want on the first try, this says absolutely nothing about its cognitive capacity, because, after all, it's just a proxy measurement for the underlying generative process. (Yes, obviously the cognition is _different_ from what happens in humans. That does not mean that... it isn't intelligence?)
> I suppose, when GPT-4 writes correctly working code that does what you want on the first try, this says absolutely nothing about its cognitive capacity
It says something about its ability to write code. Beyond that... its impossible to say.
We simply don’t have the information about generative AI models to be able to generalize from limited proxies about them; psychometry is not transferrable from humans to them — or at least, we have neither evidence nor a strong theoretical reason to think it should be.
If the imitation becomes indistinguishable to the real thing based off of every test that can possibly be generated in the universe then it is an intelligence.
In that sense, because we are making progress on producing an indistinguishable imitation... you might as well say we are making progress on an actual sentient intelligence.
Great take. But I think when autonomous agents become good enough, intelligence is certainly possible. Especially when those agents start to interact with the real world.
When you speak to someone with an 80 IQ do they introduce themselves by saying "Hello I have an 80 IQ, nice to meet you." So that, like the person I responded to above, you can compare their conversation skills to the ChatGPT4 conversation skills?
First off, you wouldn't need to do that specifically. You'd only need to know that most of the people you talk to are above an 80 IQ on any given topic, in fact most people are about a 100 IQ on any given topic. So you already have a reasonable baseline for comparison.
Secondly, I'd say you're likely the one missing OPs point by trying to take a mostly colloquial statement about how ChatGPT is about as informed as the bottomish X% of the population on any given topic and trying to be pedantic about it. Furthermore the real purpose of OPs point is that the X% is now a lower bound, even if X isn't 16% but 5%, it's only going to go up from here. Yes there's evidence of diminishing returns with the current architectures but there's also a lot of room for growth with newer architectures or multimodal modals.
I think most people understand OPs point without having the need to go around asking everyone what their IQ is. There are numerous indicators, both formal and informal, that indicate that ChatGPT is as informed on most any given topic as the bottom 16% of the population. In fact, it's likely much much higher than that.
I agree with you in general, but you are off by using "IQ on the topic". I am almost sure "on the topic" does not make sense for IQ.
IQ of GPT is general in a sense that it can solve novel tasks that some IQ 80 individuals would not be able to as long as the tasks and responses can be encoded in plain English.
That feels fundamentally different than a calculator.