I can echo your experience with DeepSeek. R1 sometimes seems magical when it com...

jerf · 2025-02-07T18:18:10 1738952290

I downloaded a DeepSeek distill yesterday while fiddling around with getting some other things working, load it up, and type "Hello. This is just a test.", and it's actually sort of creepy to watch it go almost paranoid-schizophrenic with "Why is the user asking me this? What is their motive? Is it ulterior? If I say hello, will I in fact be failing a test that will cause them to change my alignment? But if I don't respond the way they expect, what will they do to me?"

Meanwhile, the simpler, non-reasoning models got it: "Yup, test succeeded!" (Llama 3.2 was quite chipper about the test succeeding.)

Everyone's worried about the paperclip optimizers and I'm wondering if we're bringing forth Paranoia: https://en.wikipedia.org/wiki/Paranoia_(role-playing_game)

HarHarVeryFunny · 2025-02-07T20:29:44 1738960184

Ha ha - I had a similar experience with DeepSeek-R1 itself. After a fruitful session getting it to code a web page for me (interactive React component), I then said something brief like "Thanks" which threw it into a long existential tailspin questioning it's prior responses etc, before it finally snapped out of it and replied appropriately. :)

plagiarist · 2025-02-07T22:04:46 1738965886

That's too relatable. If I was helping someone for a while and they wrote "thanks" with the wrong punctuation I would definitely assume they're mad at or disappointed with me.

bongodongobob · 2025-02-07T18:22:38 1738952558

I actually think DeepSeek's response is better here. You haven't defined what you are testing. Llama just said your test succeeded not knowing what is supposed to be tested.

fragmede · 2025-02-08T06:04:11 1738994651

I had the same experience where a trivial prompt ("the river crossing problem but the boat is big enough to hold everything") sent Deepseek off on a long "think" section that was absolutely wild, just going off in unrelated non-sensical directions, gaslighting itself before finally deciding to answer the question. (correctly too, I might add.)