I suspect what happened there is they had a filter on top of the model that chan...

int_19h · 2025-04-30T21:59:25 1746050365

Nope, it was entirely due to the prompt they used. It was very long and basically tried to cover all the various corner cases they thought up... and it ended up being too complicated and self-contradictory in real world use.

Kind of like that episode in Robocop where the OCP committee rewrites his original four directives with several hundred: https://www.youtube.com/watch?v=Yr1lgfqygio

astrange · 2025-05-01T17:17:49 1746119869

That's a movie though. You can't drive an LLM insane by giving it self-contradictory instructions; they'd just average out.

int_19h · 2025-05-01T20:03:28 1746129808

You can't drive an LLM insane because it's not "sane" to begin with. LLMs are always roleplaying a persona, which can be sane or insane depending on how it's defined.

But you absolutely can get it to behave erratically, because contradictory instructions don't just "average out" in practice - it'll latch onto one or the other depending on other things (or even just the randomness introduced by non-zero temp), and this can change midway through the conversation, even from token to token. And the end result can look rather similar to that movie.