I find this position hard to grok. You’re complaining about people worrying abou...

danShumway · on May 2, 2023

> Is it that you just think P(AGI) is really low, so worrying about an unlikely future outcome bothers you when there is actual harm now?

Having now gotten a few opportunities to really use GPT-4 in-depth, I am much more bearish about the intelligence potential of LLMs than many other people are. This is not something I lose sleep over. But I don't like to focus on that because I'm not sure it matters.

I commented the same elsewhere, but there is no world where AI alignment is solved if prompt injection isn't. If you can't get an AI to reliably avoid swearing, how on earth can it be aligned?

So if you want to look though a long-term perspective and you're really worried about existential risks, the attitude towards prompt injection -- the willingness of the entire tech sector to say "we can't control it but we're going to deploy it anyway" -- should terrify you. Because how prompt injection gets handled is how general alignment will get handled.

The companies will display the same exact attitudes in both cases. They won't move carefully. They are proving to you right now that they will not be responsible. And at every step of the process there will be a bunch of people on HN saying, "okay, the AI goes a little rogue sometimes, but the problem is exaggerated, stop making such a big deal of it."

There is no point in talking about the long-term consequences of unaligned AI if we can't solve short-term alignment and short-term task derailing, because if threats like prompt injection are not taken seriously, long-term alignment is not going to happen.

theptip · on May 2, 2023

Thanks for clarifying. I strongly agree with your paragraphs 2-5, but I draw the opposite conclusion.

Many alignment researchers don’t think that solving prompt security will be similar to hard alignment challenges, I suspect it will at least somewhat (both requiring strong interpretability). Either way, it’s clearly a necessary precursor as you say.

Most people I know that take AGI risk seriously _are_ terrified of how cavalier the companies like Microsoft are being. Nadela’s “I want everyone to know we made Google dance” line was frankly chilling.

However, where I diverge from you is your final paragraph. Until very recently, as Hinton himself said, pretty much nobody credible thought this stuff was going to happen in our lifetimes, and the EA movement was considered kooky for putting money into AGI risk.

If most people think that the worst that could happen is some AI saying racist/biased stuff, some hacks, maybe some wars - that is business as usual for humanity. It’s not going to get anyone to change what they are doing. And the justification for fixing prompt security is just like fixing IoT security; a dumpster fire that nobody cares enough about to do anything.

If people, now, discovered an asteroid hurtling towards us, you’d hope they drop petty wars and unite (at least somewhat) to save the planet. I don’t happen to put P(doom) that high, but hopefully that illustrates why I think it’s important to discuss doom now. Put differently, political movements and the Overton Window take decades to effect change; we might not have decades until AGI takes off.