> there is a possibility that for things like AI, with extra time comes the ability to better understand and build those defenses before they're needed.
Or not, and damaging wrongheaded ideas will become a self-reinforcing (because safety! humanity is at stake!) orthodoxy, leaving us completely butt-naked before actual risks once somebody makes a sudden clandestine breakthrough.
> We don’t need to speculate about what would happen to AI alignment research during a pause—we can look at the historical record. Before the launch of GPT-3 in 2020, the alignment community had nothing even remotely like a general intelligence to empirically study, and spent its time doing theoretical research, engaging in philosophical arguments on LessWrong, and occasionally performing toy experiments in reinforcement learning.
> The Machine Intelligence Research Institute (MIRI), which was at the forefront of theoretical AI safety research during this period, has since admitted that its efforts have utterly failed. Other agendas, such as “assistance games”, are still being actively pursued but have not been significantly integrated into modern deep learning systems— see Rohin Shah’s review here, as well as Alex Turner’s comments here. Finally, Nick Bostrom’s argument in Superintelligence, that value specification is the fundamental challenge to safety, seems dubious in light of LLM's ability to perform commonsense reasoning.[2]
> At best, these theory-first efforts did very little to improve our understanding of how to align powerful AI. And they may have been net negative, insofar as they propagated a variety of actively misleading ways of thinking both among alignment researchers and the broader public. Some examples include the now-debunked analogy from evolution, the false distinction between “inner” and “outer” alignment, and the idea that AIs will be rigid utility maximizing consequentialists (here, here, and here).
> During an AI pause, I expect alignment research would enter another “winter” in which progress stalls, and plausible-sounding-but-false speculations become entrenched as orthodoxy without empirical evidence to falsify them. While some good work would of course get done, it’s not clear that the field would be better off as a whole. And even if a pause would be net positive for alignment research, it would likely be net negative for humanity’s future all things considered, due to the pause’s various unintended consequences. We’ll look at that in detail in the final section of the essay.
Or not, and damaging wrongheaded ideas will become a self-reinforcing (because safety! humanity is at stake!) orthodoxy, leaving us completely butt-naked before actual risks once somebody makes a sudden clandestine breakthrough.
https://bounded-regret.ghost.io/ai-pause-will-likely-backfir...
> We don’t need to speculate about what would happen to AI alignment research during a pause—we can look at the historical record. Before the launch of GPT-3 in 2020, the alignment community had nothing even remotely like a general intelligence to empirically study, and spent its time doing theoretical research, engaging in philosophical arguments on LessWrong, and occasionally performing toy experiments in reinforcement learning.
> The Machine Intelligence Research Institute (MIRI), which was at the forefront of theoretical AI safety research during this period, has since admitted that its efforts have utterly failed. Other agendas, such as “assistance games”, are still being actively pursued but have not been significantly integrated into modern deep learning systems— see Rohin Shah’s review here, as well as Alex Turner’s comments here. Finally, Nick Bostrom’s argument in Superintelligence, that value specification is the fundamental challenge to safety, seems dubious in light of LLM's ability to perform commonsense reasoning.[2]
> At best, these theory-first efforts did very little to improve our understanding of how to align powerful AI. And they may have been net negative, insofar as they propagated a variety of actively misleading ways of thinking both among alignment researchers and the broader public. Some examples include the now-debunked analogy from evolution, the false distinction between “inner” and “outer” alignment, and the idea that AIs will be rigid utility maximizing consequentialists (here, here, and here).
> During an AI pause, I expect alignment research would enter another “winter” in which progress stalls, and plausible-sounding-but-false speculations become entrenched as orthodoxy without empirical evidence to falsify them. While some good work would of course get done, it’s not clear that the field would be better off as a whole. And even if a pause would be net positive for alignment research, it would likely be net negative for humanity’s future all things considered, due to the pause’s various unintended consequences. We’ll look at that in detail in the final section of the essay.