Hacker News new | past | comments | ask | show | jobs | submit login

> Give me a concrete example of a harm prevented

One can only do this by inventing a machine to observe the other Everett Branches where people didn't do safety work.

Without that magic machine, the closest one can get to what you're asking for is to see OpenAI's logs for which completions for which prompts they're blocking; if they do this with content from the live model and not just the original red-team effort leading up to launch, then it's lost in the noise of all the other search results.




This is veering extremely close to tiger-protecting rock territory.


There's certainly a risk of that, but I think the second paragraph is enough to push it away from that problem in this specific instance.


The second paragraph is veering extremely close to shamanic person worship territory -- shamans have privileged access to the otherworld that we mere mortals lack.


Again, I agree there's certainly a risk of that, but OpenAI did show at least some examples from their pre-release red teaming of gpt-4.

What OpenAI showed definitely doesn't convince everyone (see some recent replies to my other comments for an example), though as find the examples sufficiently convincing I am unfortunately unable to see things from the POV of those who don't and therefore can't imagine what would change the minds of doubters.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: