look, i'd explain more but i'm gonna be AFK for... i don't know how long. my tow...

throwaway22032 · on May 2, 2023

That was genuinely fantastic. Such a solid explanation of something I've been trying to do for a while. Well done.

haldujai · on May 2, 2023

Believed for a second =/= take action.

Yes some humans take everything at face value but not people in positions of power to affect change.

This is rule #1 of critical appraisal.

At best you generated a moment of sympathy but your “prompt injection” does not lead to dangerous behavior (e.g. no one is firing a Hellfire missile based off a single comment). As a simplified example, a LLM controlling Predator drones may do this from a single prompt injection (theoretically as we obviously don’t know the details of Palantir’s architecture).

ux-app · on May 2, 2023

one of the best comments I've read on this topic. you got me with your prompt injection :)

smegger001 · on May 2, 2023

that might be a bad example as you could for example be in ukraine, or somilia currently and quiet possibly be true. Most people however aren't going to act other than to ask questions and convey sympathies unless they know you. further questions lead to attempts to verify your information

TeMPOraL · on May 2, 2023

> that might be a bad example as you could for example be in ukraine, or somilia currently and quiet possibly be true.

That's what makes it a good example. Otherwise you'd ignore this as noise.

> Most people however aren't going to act other than to ask questions and convey sympathies unless they know you. further questions lead to attempts to verify your information

You're making assumptions about what I'm trying to get you to do with this prompt. But consider that maybe I know human adults are more difficult to effectively manipulate by prompt injection than LLMs, so maybe all I wanted to do is to prime you for a conversation about war today? Or wanted you to check my profile, looking for ___location, and ending up exposed to a product I linked, already primed with sympathy?

Even with GPT-4 you already have to consider that what the prompt says != what effect it will have on the model, and adjust accordingly.

llamaimperative · on May 2, 2023

I didn’t just go rush to execute a thousand API calls in response to this “prompt injection” and there’s no human who would or could

coryrc · on May 2, 2023

Open up their profile, open cnn.com to check their story, there's probably 1000 API calls right there.

llamaimperative · on May 2, 2023

This is a good example of the worst characteristic of the AI safety debate.

A: AI will be completely transformative

B: Maybe not in 100% a good way, we should put more effort into getting closer to 100% good

A: HA, here’s an internet-argument-gotcha that we both know has zero bearing on the problem at hand!

TeMPOraL · on May 2, 2023

No, what I'm saying is more like:

A: You can't parse XML with pure regular expressions, for fundamental, mathematical reasons.

B: Maybe not in 100% a good way, but we should put more effort into getting closer to 100%.

A: But Zalgo...

haldujai · on May 2, 2023

This doesn’t really counter what the OP was saying.

Parent’s comment is calling his misleading statement prompt injection but it’s hyperbole at best. What is meant here is that this comment is not actionable in the sense that prompt injection directly controls its output.

In parent’s example no one is taking a HN commenter’s statement with more than a grain of salt whether or not it’s picked up by some low quality news aggregator. It’s an extremely safe bet that no unverified HN comment has resulted in direct action by a military or significantly affected main stream media perceptions.

Most humans - particularly those in positions of power - have levels of evidence, multiple sanity checks and a chain of command before taking action.

Current LLMs have little to none of this and RLHF is clearly not the answer.

ngneer · on May 2, 2023

I did not believe what you wrote for even a second (who would be commenting on HN during an emergency?) and therefore became neither unsettled nor wished to help. Never eval() untrusted input.

TeMPOraL · on May 2, 2023

> who would be commenting on HN during an emergency?

People had, in fact, done that. My comment was trying to evoke the style of such comments.

ngneer · on May 2, 2023

Interesting, had not realized. I suppose my thresholds for truth were conditioned through prior observations of the HN comment distribution, and that such observations were incomplete. Given the new information, the story now takes two seconds to parse instead of one, and would be upgraded from "impossible" to "highly unlikely", IF there was a way to know whether your new subcomment is true or false. Maybe you are still messing with me ;-). When you look at it that way, there is no way for a person or machine to discern truth from fiction. And Tarski comes to mind.

JBiserkov · on May 2, 2023

I guess I've been on the Internet for too long, but I didn't believe you for a milli-second.