Hacker News new | past | comments | ask | show | jobs | submit login

Well to pass a turing test it would have to say "hmm I was in a coma for 18 months and after waking up I ignored all current news and came here to take this test with you."

My real point is that large language models lack certain real world capabilities, like internal motivations and a life that advances every day, and this is one way we can tell them apart from a human if we did a real life turing test. You could ask one facts about its dreams and motivations, and where it hopes to be in 5 years, and it could create a plausible story, but it would all have to be made up, and at some point you could uncover inconsistencies. This is just off the top of my head, but I am sure there are other issues. I don't think any of this will be solved until we have some kind of agent with motivations, which only uses a large language model as part of its cognition. Until then they are just repeating plausible sentences, but they are not grounded in a single motivated agent.




What is OpenAI is doing right here is more difficult than passing the Turing test, the Turing test rewards machines that are indistinguishable from human beings, that's a free goal when you trained the network on text written by humans, by fitting the distribution it will behave like a human. The more difficult task that OpenAI is trying to solve is to align the NN to acknowledge that it's a GPT model, it has no way to browse the Net and it has limited knowledge about events after 2021, this is not free, only a really limited subset of the dataset was written from this prospective.


It feels like you haven't read my comment. You cannot solve a turing test using a large language model on its own. The language model can spit out human-like responses to questions you may ask, but it does not have a genuine internal life experience. What this means is that if you ask it where it hopes to be in five years, it could just as easily say "I hope to be a manager at my engineering firm" and then it could say "I hope to be running my own bakery". But it does not have genuine goals, it's just inventing plausible-sounding goals. You can tell this is happening if you have a long enough conversation, because things will be inconsistent. the language model does not actually want to run a bakery, it merely "wants" to produce plausible sounding text.

You literally cannot solve this problem by training on more text or more humanlike text. You need an "agent" that has genuine motivations and some concept of an internal experience.

There is a great paper on the limits of large language models that is worth reading if you'd like to learn more: https://dl.acm.org/doi/10.1145/3442188.3445922


You can beat the Turing test with a large generative model, you don't need the model to feel emotions, have genuine goals or anything other than being indistinguishable from a human being, If you build an agent that has and quote: "genuine motivations and some concept of an internal experience" than you will fail the Turing test because the agent will probably recognize the fact it's not human, just as a human recognizes the fact that it's not an AI, that's why the Turing test is so criticized, you're testing how human-like the model is, and that literally means how well it fits the distribution it was trained on; the technology is already here, there's no reason to scale the models further, also context windows can big enough to fool humans, we can see this happening also in other fields than text generation like image generation, sometimes it seems impossible to distinguish the output of these big generative models from one created by a human. The actual goal of today research is alignment, an AI that acts like an human is not useful, it should behave like an AI, that's why OpenAI has spent so much time on aligning its AI to behave like one and not like a human.


"you don't need the model to feel emotions, have genuine goals or anything other than being indistinguishable from a human being"

This is the point I am specifically disagreeing with. I think a language model that is just mocking human speech will not be able to accurately represent emotions and goals in a way that cannot be detected.

"If you build an agent that has and quote: 'genuine motivations and some concept of an internal experience' than you will fail the Turing test because the agent will probably recognize the fact it's not human"

I would expect it to know its not human, but it could agree to pretend to be one for the test. I think an agent with genuine experience would be better able to pretend its a human in the same way that a person lying about their goals and motivations can be more convincing than a language model. I can better make up a lie about a real person because I understand the nuance of human experience. For example language models can fail to learn arithmetic or basic physics. Things any 15 year old person with a basic education would know. We tend not to explain the most obvious facts in a physics textbook, like what water feels like on your fingers, so it may be possible to ask a language model questions about physics or human experience that are so ubiquitous they don't commonly get written down.

Of course you will say, an agent with no physical form or metal hands also will not know what water feels like on your fingers. But this only proves my point - at some point the machines lack certain aspects of basic human experience that cannot just be regurgitated from a text corpus. Anyone familiar with research in to flaws of state of the art language models could detect a large language model relatively quickly. I suppose I should retract my earlier claim that an agent with motivations could pass a turing test, because I suppose I just talked myself out of that too. Point is, its very hard to pass a turing test if the test is administered by an AI researcher, because they will know what to look for.

I guess that's the point of the Voight-Kampff test.


> This is the point I am specifically disagreeing with. I think a language model that is just mocking human speech will not be able to accurately represent emotions and goals in a way that cannot be detected.

It would be more interesting if you explained why you think this. As it stands it reads as a magical argument / one of those "I'll know it when I see it" statements.

A lot of people are conditioned against accepting that they are fundamentally not very complicated and they wouldn't mind interacting with a philosophical zombie. By definition you would not, or it would not be one. By making that conclusion you recognise you cannot know the internals of another being except by their statements. Hence, just language is enough. No need for subjective experience.


Just wanted to follow up. It's only been one day and already people are finding ways in which this program behaves like a piece of software and not a human. What I was trying to say is that anyone familiar with state of the art language models will be quickly able to uncover things like this, which a human would never do. That was my comment about a "Voight Kampf test" (the test in Blade Runner designed to detect replicants).

https://twitter.com/carnage4life/status/1598332648723976193

EDIT: I've finally played with it myself. This thing is a cute toy that doesn't understand reality. "PROMPT: what is the third word in the second question I asked?

REPLY: The third word in the second question you asked is "geographic." Here is the full text of the second question: "I am interested in seeing how you understand geographic areas, like maps of a neighborhood.""

Note: that was the first question I asked, not the second. And obviously that is not the third word in the sentence.


Now you are the one who doesn't seem to read the comments, OpenAI's AI is not trained to behave like a human, it is tuned to behave like an AI, I guess when you wrote the comment you didn't even try the AI in the first place, and you don't realize how often the AI explains that it is a language model built by OpenAI, and how it explains to you that it has several limitations, such as the inability to access the Internet, etc., as I said in my first comment on this discussion, this is the most difficult task that OpenAI is trying to solve, instead of just beating the Turing test. You linked how to get around the filter imposed on the AI, but that's not something you would do with a human being ahah, so I don't see what the point would be here, it doesn't behave like a human being in the first place (as it should)


I guess my point is that at no point has anyone shown that it is easy to get a language model to pass a turing test. This one can't even count words.


The point you brought from the beginning was that a language model cannot beat a Turing test, and the only actual "argument" you brought was: he failed in X task, and the conclusion was, "he doesn't understand reality", what would happen if he actually answered correctly? Would he have suddenly acquired the ability to understand reality? I don't think so; To me it is clear that this AI already has a deep understanding of reality and the fact that chatGPT failed a task doesn't convince me otherwise and it shouldn't convince you either, these kinds of "arguments" usually fall short very soon as history has shown, you can find a lot of articles and posts on the net carrying arguments like yours (even from 2022) that have been outdated by now, the point is that these neural networks are flexible enough to understand you when you write, understand reality when you ask about geography or anything else, and flexible enough to beat a Turing test even when they are trained "only" on text and do not need to experience reality themselves, and the imitation game (as it was called by Turing) can be beaten by a machine that has been trained to imitate, no matter if the machine is "really" thinking or just "simulating thinking" (the Chinese room), beating the test wouldn't be a step toward artificial general intelligence as a lot of people seems to erroneously believe, the actual step toward artificial general intelligence is alignment, maybe agents etc


> It would be more interesting if you explained why you think this.

I tried to. I have said there are things about the human experience which might not be present in text datasets. This is a problem with models understanding physics for example. I am sorry I am not enough of an expert to provide more detailed arguments, but rest assured my opinion on this matter is irrelevant to you beyond this chain of internet comments.

> Hence, just language is enough.

Language is enough to administer a turing test, but I am not sure that a large language model trained on a corpus of text can gather enough information to be 100% successful in a rigorous test.


You need to feed it context and it does a pretty good job of faking it. The first sentence and everything ending with a question mark was my input, the rest came from GPT-3

https://pastebin.com/7ZRL6MCK


> You need an "agent" that has genuine motivations and some concept of an internal experience.

Don’t you just need a more complete model of such an agent? I agree you can’t get such a thing by training on text.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: