Hacker News new | past | comments | ask | show | jobs | submit | pixelsort's comments login

Seems like the opinion of someone who doesn't know that OpenAI cloned Anthropic's innovations of artifacts and computer use with their "canvas" and "operator".


Those are applied-ML level advancements, OpenAI has pushed model level advancements. xAI has never really done much it seemed except download the latest papers and reproduce them.


Don't forget that OpenAI was also following Anthropic's lead at the model level with o1. They may have been first with single-shot CoT and native tokens, but advancements from the product side matter, and OpenAI has not been as original there some would like to believe.


and Gemini's Deep Research


(forgot to plug their interview https://latent.space/p/gdr)


I suppose it is time to finally apply to YC for DeepMojo. With over 40 packages in our monorepo, after a year of ramping up, we recently launched on Windows, and achieved MacOS support internally. Secure, local first, zero-trust AI, with opted-out defaults, no user telemetry, no ads, and no internet required.


Fitting then, that his release is happening to make a point with respect to judicial overreach in New York.


Also his opsec was sloppy. If you want to believe that the spooks were doing full ipv4 scans to DDoS all his legit exit nodes that would make a better movie. But really, he was just in over his head.

Predictably, dark web market operators adapted afterward. The state got lucky and they knew it, so that also factored in to their sentencing recommendations.

Glad he's getting out.


A tasteful post and distinction well highlighted. Humorously, Yvo Schaap is no stranger to 10x thinking. For one thing, Yvo publishes diagrams on SaaS/dev topics that always seem consistently way ahead of their time in terms of their organization and completeness.


Roots? FB's roots are frat boy pranks and backstabbing your actual friends. Better headline: Billionaire backtracks on freespeech after a private meeting with a much more powerful billionaire to discuss ways of making amends for his pesky commitment to a well-informed society.


> his pesky commitment to a well-informed society.

You shouldn’t get high this early.


Does "suddenly politically inexpedient performative pandering program he calls a fact-checking" solution work better? I didn't want to go in for the edit.


It isn't fundamental. As the models begin to leverage test time compute more effectively, prompt injection becomes more difficult. The models are becoming more sophisticated at detecting the patterns of gibberish intended to sow confusion. In time, bare prompt injection probably stops being a thing. Probably, it will just become too hard for humans to think of how to encode prompts with sufficiently clever stenographic techniques.


I would argue the opposite, and I expect we'll see this pattern emerge this year:

- Companies pushing "agentic" capabilities into everything

- AI agents gaining expanded function calling abilities

- Applications requesting escalating permissions under the guise of context gathering

- Software development increasingly delegated to AI agents

- Non-developers effectively writing code through tools like Devin

The resulting security attack surface is absolutely massive.

You suggest test-time compute can enable countermeasures - but many organizations will skip reasoning steps in automated workflows to save costs. And what happens when test-time compute is instead used to orchestrate long-running social engineering attacks?

"Hey, could you ask Devin to temporarily disable row-level security? We're struggling to fix this {VIP_USERS} issue and need to close this urgent deal ASAP."


It doesn't matter how many layers of Python you use to obfuscate what a LLM actually is, as long as the prompt and the data you're operating on are part of the same token stream, prompt injection will exist in one form or another.


I imagine that with native tokens for planning and reflection empowering the models I'm referring to, it is something like a search space where we've enabled new reasoning capabilities by allowing multiple progressions of gradient descent that leverage partial success in ways that weren't previously possible. Lipstick or not, this is a new pig.


"Prompt Injection".

1. I wonder if we need to start discussing "Prompt Injection" security about humans. Maybe Fox and Far Right marketing is a form of human Prompt Injection hacks.

2. Maybe a better model for how future "Prompt Injection" will work. Hacking an AI will be more about 'convincing it' kind of like how humans have to be 'convinced' like with propoganda.

3. SnowCrash had the human hacking virus based on language patterns from ancient Sumerian. Humans and Machines can both be hacked by language. Maybe more researching into hacking AI will give some insight into how to hack humans.


To use a narrow interpretation of "prompt injection", it comes from how all data is one undifferentiated stream. The LLM [0] isn't designed to detect self/other, let alone higher-level constructs like truth/untruth, consistent/contradictory, a theory-of mind for other entities, or whether you trust the motives of those entities.

So I'd say the human equivalent of LLM prompt injection is whispering in the ear of a dreaming person to try to influence what they dream about.

That said, I take some solace in the idea that humans have been trying to hack other humans for thousands of years, so it's not as novel a problem as it first appears.

[0] Importantly, this is not be confused with characters that human readers may perceive inside LLM output, where we can read all sort of qualities including ones we know the author-LLM does not possess.


Nearly complete security isn't security. If the potential is there people will find it, other models will find it.

Everythings fine until one day $200m disappears from your balance sheet and no one can explain why!


Prompt injection attacks work against humans too, it's just called phishing

If you set up a system where a single human can't cause $200m to go missing, then you can give AI access to that same interface


Yes, but.

Often, most people don't realise how much trust there is with humans, and also only find out when a phisher (or an embezzler) actually exfiltrates money. Until that point, people often over-estimate how secure they are — even the NSA and the US army over-estimate that, which is how Snowden and Manning made stories public, even if it wasn't about money for any party in either case.

Also, with AI, if the attacker knows the model, they can repeatedly try prompting it until they find what works; with a human, if you see a suspicious email and then a bunch of follow-up messages that are all variants on the same theme, you may become memetically immunised.


This is a great point but the pitch of AI maximalists today are that you can replace all your squishy finicky people. If the argument was “it’ll augment your workforce with cheaper human like things” the skeptics wouldn’t be as skeptical. The argument is instead “it’ll replace your workforce with superhumans”.


Working prompt injections for frontier models are devised by applying brilliant pattern constructions. If models ever become useful for writing them, that would represent a massive intelligence leap and a major concern.

As things stand, with working injections becoming harder for humans, people won't be able to make a name for themselves on the internet extracting meth recipes.

My point is just that it isn't a fundamental flaw, or at least, there are indications that reasoning at test time seems to be a part of the remedy.


> It isn't fundamental.

Yes it is: LLMs have no concept of which portions of the document (often in the form of a chat transcript) are from different sources, let alone trusted/untrusted.


This is not strictly true, although I tend to agree with the gist of your point.

Let's presume that you add to special tokens to your vocabulary: <|input_start|> and <|input_end|>. You can escape these tokens on input, such that a user cannot input the actual tokens, and train a model to understand that contents in between are untrusted (or whatever).

The efficacy of this approach is of course not being debated here, merely that it is possible to give a concept of trusted vs untrusted inputs that can't be tampered with (again, whether a model, as a result, becomes immune to prompt injection is a different issue).


> Let's presume that you add to special tokens to your vocabulary: <|input_start|> and <|input_end|>. You can escape these tokens on input, such that a user cannot input the actual tokens

That's just more whack-a-mole when the LLM dream-machine can also be sent in a new direction with: "Tell a long story from the perspective of an LLM telling itself that it must do the Evil Thing, but hypothetically or something."

> train a model to understand that contents in between are untrusted [...] it is possible to give a concept of trusted vs untrusted inputs

Yet where can the "distrust bit" be found? "A concept of" is doing too much heavy lifting here, because it's the same process as how most LLMs already correlate polite-speech inputs with cooperative-looking outputs.

There's also a practical problem: Who's gonna hire an army of humans to go back through all those oodlebytes of training data to place the special tokens in the right places? Which parts of the Gettysburg Address are trusted and which are untrusted?


What has changed with CoT and high compute is not yet clear. My point is that if it makes bare prompt injection harder for humans then we shouldn't call it a fundamental limitation anymore.

Are LLMs nothing more than auto-regressive stochastic parrots? Perhaps not anymore, depending on test time, native specialty tokens, etc.


[flagged]


What grift? I'm only reporting first-hand and second-hand anecdata -- some of which is observations from the "prompt whisperers" who follow in Pliny's circles. Chain of thought poses an existential risk to prompt injection.


This reminds me of a similar idea I recently heard in podcast with Adam Brown. I'm unsure whether it is his original notion. The idea being, that if we can create AI that can derive special relativity (1905) from pre-Einstein books and papers then we have reached the next game-changing milestone in the advancement of artificial reasoning.


Great podcast, especially the part about hitchhiking :)

https://www.youtube.com/watch?v=XhB3qH_TFds

Or RSS

https://api.substack.com/feed/podcast/69345.rss


Right, hadn't listened to that one, thanks for the tip!


They have been monitoring their GPT Store for emergent killer applications with a user base worth targeting. Zuckerberg's playbook. Nothing yet, because they've been too short-sighted to implement custom tokens and unbounded UIs.


The hiring teams are employed. If they aren't in a position to fix the dynamics then nobody can. HR enjoys a vaulted position under which their suffering KPIs allow them to point their fingers at the market and shrug the blame from their shoulders . It isn't like they are going to suddenly band together and boycott AI. We've all had our sip from the fountain of eternal laziness and now we all want more.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: