Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Superalignment (openai.com)
219 points by tim_sw on July 5, 2023 | hide | past | favorite | 370 comments



You have to give them credit for putting their money where their mouth is here.

But it's also easy to parody this. I am just imagining Ilya and Jan coming out on stage wearing red capes.

I think George Hotz made sense when he pointed out that the best defense will be having the technology available to everyone rather than a small group. We can at least try to create a collective "digital immune system" against unaligned agents with our own majority of aligned agents.

But I also believe that there isn't any really effective mitigation against superintelligence superseding human decision making aside from just not deploying it. And it doesn't need to be alive or anything to be dangerous. All you need is for a large amount of decision-making for critical systems to be given over to hyperspeed AI and that creates a brittle situation where things like computer viruses can be existential risks. It's something similar to the danger of nuclear weapons.

Even if you just make GPT-4 say 33% smarter and 50 or 100 times faster and more efficient, that can lead to control of industrial and military assets being handed over to these AI agents. Because the agents are so much faster, humans cannot possibly compete, and if you interrupt them to try to give them new instructions then your competitor's AIs race ahead the equivalent of days or weeks of work. This, again, is a precarious situation to be in.

There is huge promise and benefit from making the systems faster, smarter, and more efficient, but in the next few years we may be walking a fine line. We should agree to place some limitation on the performance level of AI hardware that we will design and manufacture.


The recent paper about using gpt-4 to give more insight into its actual internals was interesting, but yeah the risks seem really high at the moment that we'd accidentally develop unaligned AGI before figuring out alignment.

Out of the options to reduce that risk I think it would really take something like this, which also seems extremely unlikely to actually happen given the coordination problem: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-no...

You talk about aligned agents - but there aren't any today and we don't know how to make them. It wouldn't be aligned agents vs. unaligned, it's only unaligned.

I don't think spreading out the tech reduces the risk. Spreading out nuclear weapons doesn't reduce the risk (and with nukes at least it's a lot easier to control the fissionable materials). Even with nukes you can still create them and decide not to use them, not so true with superintelligent AGI.

If anyone could have made nukes from their computer humanity may not have made it.

I'm glad OpenAI understands the severity of the problem though and is at least trying to solve it in time.


What is the "recent paper about using gpt-4 to give more insight into its actual internals?"



Unaligned doesn't really seem like it should be a threat. If it's unaligned it can't work toward any goal. The danger is that it aligns with some anti-goal. If you've got a bunch of agents all working unaligned, they will work at cross-purposes and won't be able to out-think us.


Alignment is about agreement with human preferences and desires, not internal consistency. e.g. An AI that wanted to exterminate humanity could work towards that goal, but it would be unaligned (unaligned with humanity). Alignment is basically making sure humanity is fine with what the AI does.


Humanity has more than one alignment...


Yes, that's part of the reason why alignment is such a huge problem.

You can imagine an AI that answers questions and helps you get things within reason that doesn't hurt anyone else plus corrections for whatever problems you imagine with this. That's roughly an aligned AI. It will help you build a bomb as a fun experiment, but would stop you from hurting someone with the bomb.


Apart from some obvious cases that everyone agrees with, alignment is not a big problem it is an incoherent one. It can’t be “solved” any more than the problem of what the best ice cream flavor is can be solved.

Humanity doesn’t have unified interests or shared values on many things. We have different cultural memories and different boundaries. What to some is an expression of a fundamental right is an affront.


At the limit sure there’s variance, but our shared selected history has a lot in common, something a non-human intelligence would not get for free: https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden...

I’m also not a moral relativist, I don’t think all values are equivalent, but you don’t even need to go there - before that point a lot of what humans want is not controversial and the “obvious” cases are not so obvious or easy to classify.


Ya, why do you think there are alarm bells sounding off everywhere right now…

The capabilities are coming fast. There is no alignment.


The most likely alignment we will get is the alignment m of money to power.


This is a misunderstanding of what AI alignment problems are all about.

Alignment != capability

Think a paperclip maximizing robot that in its process of creating paperclips kills everyone on earth to turn them into paperclips.


Corporations like Saudi Aramco are already doing that. You don't need a superintelligent AI, corporations that maximize profit are already sufficient as misaligned superhuman agents.


You can't maximize profit without customers, they must be aligned with someone.


They're aligned with the military-industrial complex. The US military is one of the biggest consumers of fossil fuels[1] and it's the same with other nations and their energy use. So profitable is not the same as aligned with human values.

1: https://en.m.wikipedia.org/wiki/Energy_usage_of_the_United_S...


> The US military is one of the biggest consumers of fossil fuels

I guess this phrasing is up for debate, but according to the source linked "the DoD would rank 58th in the world" in fossil fuels.

Is that a huge amount of fossil fuel use? Absolutely. But one of the biggest?


> According to the 2005 CIA World Factbook, if it were a country, the DoD would rank 34th in the world in average daily oil use, coming in just behind Iraq and just ahead of Sweden.

Sure, the phrasing could be debated but the fact that it even ranks close to actual nation states is already problematic. The US military is basically an entire nation state of its own. This is nothing new if you're old enough to have observed the kind of damage it has done but it demonstrates my point about profit and alignment. Profits are very often misaligned with human values because war is extremely profitable.


Oh there's no denying the US military has ballooned to the size of a small to medium-sized country. That alone is a huge issue for me personally - I do agree with our country having any form of standing military but that precedent was abandoned 80 years ago.

I'm not sure how to properly compare the military of one country with the entirety of a country ~1/30th the size. On the surface it doesn't seem crazy for those to have similar budgets or resource use.


The comparison is in terms of energy use since at the end of the day that is the fundamental currency of all techno-industrial activity. The point is that the global machinery that is currently guiding civilizational progress is fundamentally anti-life. It constantly grows and subsumes whatever energy resources are accessible without any regard for negative externalities like pollution and environmental degradation. This is why I don't take AI alarmism seriously because the problem is not the AI, the problem is the organization of techno-industrial civilization and its focus on exponential growth.

It's only going to keep getting worse and the AI alarmism is not doing anything to address the actual root causes of the crisis. If anything, AI development might actually make things more sustainable by better allocating and managing natural resources so retarding AI progress is actually making things worse in the long run.


I think those really are separate concerns that should both be given more attention.

There's a strong correlation between GDP growth and oil use, that's a huge problem and one that likely can't be solved without fundamentally revisiting modern economic models.

AI poses it's own concerns though, everything from the alignment problem to the challenge of even having to define what consciousness even is. AI development won't inherently make allocating natural resources easier - with the wrong incentive model and lack of safety rails AI could find its own solution to preserving natural resources that may not work out so well for us humans.


The current model is already destructive and most of the market is managed by artificial agents. Schwab will give you a roboadvisor to manage your retirement account so AI is already managing large chunks of the financial markets. Letting AI manage not just the financial aspects but things like farmland is an obvious extension of the same principle and since AIs can notice more patterns it's going to become basically a necessity because global warming is going to make large parts of existing farmlands unmanageable. Floods and droughts are becoming more common and humans are very bad at figuring out the weather so there will be an AI agent monitoring weather patterns and allocating seeds to various plots of land to maximize yields.

Bill Gates has bought up a bunch of farmland and I am certain he will use AI to manage them because manual allocation will be too inefficient[1].

1: https://www.popularmechanics.com/science/environment/a425435...


US DOD fuel use being the level of Sweden doesn’t seem problematic to my envelope-math; it seems to reflect the size of the entities involved.

Iraq is a now broken third word country/economy in recovery so not a great comparable to US. Sweden is small but a good comparable culturally/development-wise. US is 331 million people. It spends 3% of GDP on military. 3% of 331m is 10 million. Sweden is 10 million people. U.S. military fuel use is in line with Sweden’s.

I could be off here (DOD!=US military?), corrections welcome, but I wouldn’t even be shocked if a military entity uses 3-10x more fuel than a civilian average and above math puts us surprisingly close to 1x.


Math seems correct but US military also includes conglomerates and companies like Palantir and Anduril (main reason it is described as an industrial complex is because there is no clear distinction between corporations and how their activities are tied up with military spending and energy use).


Bit of an interesting thought experiment there, could a corporation maximize profit without customers? I wonder if we can find any examples of this type of behavior...


Yes, but a profit maximizer doesn’t need to eliminate all humans to become a big problem.


In fairness, corporations can still be fraudulent.


No, I understand what you're saying, I just think you're wrong. To be a little clearer: you're assuming a single near-omnipotent agent randomly selects an anti-goal and is capable of achieving it. If we instead create 100 near-omnipotent agents odds are that the majority will be smart enough to recognize that they have to cooperate to achieve any goals at all. Even if the majority have selected anti-goals, it's likely that the majority of the anti-goals will be at cross-purposes. You'll also have a paperclip minimizer, for example. Now, the minimizers are a little scary but these are thought experiments and the goals will not be so simple (nor do I think it would be obvious to anyone including the AIs which ones have selected which goals.) The AIs will have to be liars if they select anti-goals, and they will have to not only lie to us but lie to each other, which makes coordination very hard bordering on impossible.

In some ways this is a lot like Bitcoin, in that people think that with enough math and science expertise you can just reason your way out of social problems. And you can, to an extent, but not if you're fighting an organized social adversary that is collectively smarter than you. 7 billion humans is a superintelligence and it's a high bar to be smarter than that.


It’s worth reading about the orthogonality thesis and the underlying arguments about it.

It’s not an anti-goal that’s intentionally set, it’s that complex goal setting is hard and you may end up with something dumb that maximizes the reward unintentionally.

The issue is all of the AGIs will be unaligned in different ways because we don’t know how to align any of them. Also, the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

There’s a lot of thoughtful writing that exists on this topic and it’s really worth digging into the state of the art about it, your replies are thoughtful so it sounds like something you’d think about. I did the same thing a few years ago (around 2015) and found the arguments persuasive.

This is a decent overview: https://www.samharris.org/podcasts/making-sense-episodes/116...


> the first to be able to improve itself in pursuit of its goal could take off at some threshold and then the others would not be relevant.

Thanks for reminding me that I need to properly write up why I don't think self-improvement is a huge issue.

(My thought won't fit into a comment, and I'll want to link to it later).


And I'm less concerned about emergent alignment with an anti-goal (paperclip optimization) than I am with a scenario like ransomware designed by malicious humans using a super AI aligned with an anti-goal.


"Even if you just make GPT-4 say 33% smarter and 50 or 100 times faster and more efficient, that can lead to control of industrial and military assets being handed over to these AI agents."

I call BS on this...it's an LLM...


It's important to recognize that the model is fully capable of operating in open world environments, with visual stimuli and motor output, go achieve high level tasks. This has been demonstrated in proofs of concepts several times now with systems such as voyager et al. So, while there are certainly some details that are important, much of them are the annoyances that we devs deal with all the time (how to connect various parts of a system properly, etc) the fundamental capabilities of expressivity in these models are not that limited. Certainly limited in some sense (as seen in the several papers applying category theoretic arguments to transformers) but for many engineering applications in the world, these models are very capable and useful.

Guarantees of correctness and safety are obviously of huge concern, hence the main article. But it's absolutely not unreasonable to see these models allowing humanoid robots capable of various day to day activities and work.


To save others the trouble, I googled Voyager, it's pretty interesting. I had no idea an LLM could do this sort of thing:

https://voyager.minedojo.org/



> https://palm-e.github.io/

The alignment problem will come up when the robot control system notices that the guy with the stick is interfering with the robot's goals.


A robot control system without a mechanical override in favor of the stick is a poor one indeed.


Voyager is pretty cool, but it's not transferable to the real world at all. The automatic curriculum relies on lots of specific knowledge from people talking about how to get better at Minecraft. The skill library writes programs using the Mineflayer API, which provides primitives for all physics, entities, actions, state etc. A real-life analogue of that would be like solving robotics and perception real quick.


I don't understand why Voyager benefits from being an LLM, vs a "normal" Neural Net. It's not talking to anyone or learning from text.


> We introduce Voyager, the first LLM-powered embodied lifelong learning agent to drive exploration, master a wide range of skills, and make new discoveries continually without human intervention in Minecraft. Voyager is made possible through three key modules: 1) an automatic curriculum that maximizes exploration; 2) a skill library for storing and retrieving complex behaviors; and 3) a new iterative prompting mechanism that generates executable code for embodied control.

It looks like being LLM-based is helpful for generating control scripts and communicating its reasoning. Text seems to provide useful building blocks for higher-order reasoning and behavior. As with humans!


> It's important to recognize that the model is fully capable of operating in open world environment

How so? If they cannot drive a car?


What evidence do you have that allows you to make the assertion that they 'cannot drive a car'?


Saying it's "an LLM" doesn't change the impact. GPT4 is an LLM, and so are many others ranging from toy quality to GPT3.5. It is very clear GPT4 is much better. If there is another jump like GPT4 , whether it is LLM or not, it's impact will be huge.


Meanwhile, GPT-4 still can’t reliably multiply small numbers.

https://arxiv.org/abs/2304.02015


Do you find that comforting when an emergent property of a system whose objective is to complete the next word is able to make drawings?


Imagine you meet a human who is eloquent, expressive, speaks ten languages, can pass the bar or the medical board exams easily, but who cannot reliably distinguish between truth and falsehood on the smallest of questions ("what is 6x9? 42") and has no persistent memory or sense of self.

Would you be "comforted" that this mega-genius is worse at arithmetic than you are and doesn't remember what it did yesterday?

Probably not. You might well be worried that this weird psychopath is going to get a medical license and cut the wrong number of fingers off of a whole bunch of patients.


We're agreeing, aren't we?


A minor inconvenience when GPT-4 has no problem learning how to use a code interpreter.


It's alright with algorithmic prompts - https://arxiv.org/abs/2211.09066

also it knows when to use a calculator if it has access to one so it's not a big deal


"This Apple II is useless. It can't even run Crysis."


Plus the next thing might not be an LLM.


Military command and control is already performed via input and output of token streams.


It's autocomplete on steroids…

That can guide me through the process of writing a Navier-Stokes simulation…

In a foreign language…

That can be trivially put into a loop and tasked with acting like an agent…

And which is good enough that people are already seriously asking themselves if they need to hire people to do certain tasks…

Why call BS?

It's not perfect, sure, but it's not making a highly regional joke about the Isle of White Ferry[0] either.

[0] "What's brown and comes steaming out the back of Cowes?"


But you're also autocomplete (prediction engine) on steroids.

https://www.psy.ox.ac.uk/news/the-brain-is-a-prediction-mach...


"It's one of those irregular verbs, isn't it? I'm good at improv and speaking on my feet, you finish each other's sentences, they're just autocomplete on steroids."

https://en.wikiquote.org/wiki/Yes,_Minister


LLM is already a misnomer. Many of the latest models are better called LFMs (Large Foundation Models). They have multimodal capabilities. Some can even handle sensory input humans can't.

Another comment already links to demos and papers of LFMs operating robots and agents in 3D environments.


GPT-4 is actually multimodal, not just an LLM. OpenAI just doesn't provide the public with any way to use the image embedding capabilities.


It’s not money where their mouth is…

It’s paying a cost of doing business. The minimum theater required to minimize expected regulatory cost.

They want to own the saftey issue so they can risk your life for their profit.


Call it insurance. It’s the R&D cost to try to make sure your models don’t do/say anything that will get you into trouble.


> Even if you just make GPT-4 say 33% smarter and 50 or 100 times faster (...) humans cannot possibly compete

I sincerely doubt that. Gpt4 and it's ilk excel at the 5 paragraph essay on topics that are so well understood by humans that books have been written about them. ChatGPT4 is a very useful took when writing text. But it is useful in the sense that a thesaurus and a spell check use useful.

What chatGPT4 truly sucks at is understanding a large amount of text and synthesizing it. That token limit is really a problem if you want gpt to become a scientist or a military strategist. Strategy requires you to consume a huge amount of less than certain information and to synthesize that in a coherent strategy, preferably explainable in terms potus can understand. Science is the same thing. Play the the Phd game that just featured on HN frontpage. It is a lot of false starts, a lot of reading, again things gpt just cannot do.

By the way their text understanding is really a lot less than human. A nice example are 'word in context' puzzle's. In this puzzle a target word is used in two different sentences. The puzzle is to decide if the word is used in the same meaning or not. chatGpt4 does better than 3.5 but it doesn't take a lot of effort to trick it. Especially if you ask a couple of questions in one prompt, it will easily trip up.


> Even if you just make GPT-4 say 33% smarter

What is your unit of intelligence?


> the best defense will be having the technology available to everyone rather than a small group

Proliferation of a dangerous technology is the best defense?

Sure, it's a libertarian meme, but it wouldn't work for nuclear weapons or virus research. Maybe that would make sense, but the argument needs to be made.


There's nothing libertarian about this. Inventing and securing dangerous technology is an affront to individual freedoms. The idea of proliferation as deterance is totally separate from libertarianism and seeded more in fear than anything else.


The prioritization of individual freedoms above most other considerations (along with the assumption that it will work out better in the end) is what libertarianism is all about. Maybe you’re a libertarian without realizing it? :)


"And it doesn't need to be alive or anything to be dangerous"

Why are tech people stuck in the now and not future looking?


I just think it's much easier to convince people that existing types of AIs will get somewhat smarter and significantly faster. And that's dangerous enough.

My own belief is that regardless of what we do in terms of the most immediate dangers, within one or two centuries (maximum) we will enter the posthuman era where digital intelligent life has taken control of the planet. I don't mean "posthuman" as in all of the humans have been killed (necessarily), just that what humans 1.0 do won't be very important or interesting relative to what the superintelligent AIs are doing.

I don't think there is anything that prevents people from giving AI all of the characteristics of animals (such as humans). I think it's foolish, but researchers seem determined to do it.

But this is fairly speculative and much harder to convince people of.


If the value of superintelligence is to lead to an Age of Em scenario where AIs (or Ems) do most of the intellectual labor, the reality is still that they would be doing this labor in service of humans. I could see a scenario where it is done in service of the AIs instead, but it would look nothing like the existential risk stuff bandied about by these weenies.


There is no example in our knowledge of any lifeform prioritizing (writ large) the well-being of a different lifeform over its own.


Why call AIs a life form? They aren't like cellular life.


I think the assumption they were making was that rather than an LLM this was a type of AI that has animal-like characteristics. Which sounds fanciful but at least at a functional level you could get some main aspects just by removing guardrails from a large multimodal model and instructing it to work on its own goals, self preservation, etc. And researchers are working hard to create more lifelike systems that wouldn't necessarily be very similar to LLMs.


The animal like systems might be interesting to observe, but it doesn't sound like they would be useful for doing much work. I am not sure where the reliance on them would come in.


How do you jump to this? What is it that would inherently lead an intelligent species dramatically smarter than us to stay focused on servicing us?

We humans sure didn't do this. We're genetically extremely similar to other primates and yet we destroy their habitats, throw them in zoos, and use them for lab experiments.


Currently, LLMs seem to prioritize their current goal, so if the goal is solving math puzzles or genetic problems, they would probably keep doing that too.


I'd love to be able to see more about how the main LLMs are really trained and limited with regards to their goals and scoring algorithms.

It seems reasonable that they wouldn't deviate, but that depends on how specifically and wholly the original goals were defined. We'd basically be attempting to outwit the LLMs, I'm not sure if that's realistic or not.


Control of military and industrial assets won't be handed willy nilly to AIs, given the threat of legal liability for any mistakes the AI could make. Their tendency of making things up is well known by now.


You've been spewing out nonsense at an impressive pace. Stop digging. Read more, write less.


And yet the military regularly hands machine guns to 18 year-olds...


The 18 year old human alignment problem has been solved pretty well. Not perfectly , but enough to justify handing out such weapons.


From a layman's perspective when it comes to cutting edge AI, I can't help but be a bit turned off by some of the copy. It seems it goes out of its way to use purposefully exhuberant language as a way to make the risks seem even more significant, just so as an offshoot it implies that the technology being worked on is so advanced. I'm trying to understand why it rubs me particularly the wrong way here, when, frankly, it is just about the norm anywhere else? (see tesla with FSD, etc.)


The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously. People often have an initial knee-jerk negative reaction to it (for not crazy reasons, lots of stuff is often overhyped), but that doesn't make it wrong.

It's uncool to look like an alarmist nut, but sometimes there's no socially acceptable alarm and the risks are real: https://intelligence.org/2017/10/13/fire-alarm/

It's worth looking at the underlying arguments earnestly, you can with an initial skepticism but I was persuaded. Alignment is also been something MIRI and others have been worried about since as early as 2007 (maybe earlier?) so it's also a case of a called shot, not a recent reaction to hype/new LLM capability.

Others have also changed their mind when they looked, for example:

- https://twitter.com/repligate/status/1676507258954416128?s=2...

- Longer form: https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-ho...

For a longer podcast introduction to the ideas: https://www.samharris.org/podcasts/making-sense-episodes/116...


This is an interesting comment because lately it feels like its very cool to be an alarmist! Lots of positive press for people warning about the dangers of AI, Altman and others being taken very seriously, VC and other funders obviously leaning into the space in part because of the related hype

And in other fields, being alarmist has paid off too with little recourse for bad predictions -- how many times have we heard that there will be huge climate disasters ending humanity, the extinction of bees, mass starvation, etc. (not to diminish the dangers of climate change which is obviously very real)? I think alarmism is generally rewarded, at least in media.


The extinction of bees, mass starvation, and the ozone hole (bonus) are all examples of alarmist takes that were course corrected. It’s sort of a weird spot to argue that they were overblown when the reason they are not a problem now is because they were addressed.


Bee extinction wasn't addressed, it was just revealed to not be true. Article with lots of data here:

https://www.acsh.org/news/2018/04/17/bee-apocalypse-was-neve...

Mass starvation wasn't "addressed" exactly, because the predictions were for mass starvation in the west, which never happened. Also the people who predicted this weren't the ones who created the Green Revolution.

Ozone hole is I think the most valid example in the list, but who knows, maybe that was just BS too. A lot of scientific claims turn out to be so, these days, even those that were accepted for quite a while.


What does it mean to address the risk of superintelligence? There is no way to stop technological progress and AI development is just part of the same process. Moreover, the alarmism doesn't make much sense because we already have misaligned agents at odds with human values, those agents are called profit seeking corporations but I never hear the alarmists talk about putting a stop to for-profit business ventures.

Do you know anyone that considers the pursuit of profits and constant exploitation of natural resources as a problem that needs to be addressed because I don't. Everyone seems very happy with the status quo and AI development is just more of the same status quo development, just corporations seeking ways to exploit and profit from digital resources. OpenAI being a perfect example of this.


> There is no way to stop technological progress

What makes you say this is impossible? We could simply not go down this road, there are only so many people knowledgeable enough and with access to the right hardware to make progress towards AI. They could all agree, or be compelled, to stop.

We seem to have successfully halted research into cloning, though that wasn't a given and could have fallen into the same trap of having to develop it before one's enemy does.


There are no enemies. The biosphere is a singular organism and right now people are doing their best to basically destroy all of it. The only way to prevent further damage is to reduce the human population but that's another non-starter so as long as the human population is increasing it will compel the people in charge to continue pushing for more technological "innovation" because technology is the best way to control 8B+ people[1].

Very few people are actually alarmed about the right issues (in no particular order): population size, industrial pollution, military-industrial complex, for-profit multi-national corporations, digital surveillance, factory farming, global warming, &etc. This is why the alarmism from the AI crowd seems disingenuous because AI progress is simply an extension of for-profit corporatism and exploitation applied to digital resources and to properly address the risk from AI would require addressing the actual root causes of why technological progress is misaligned with human values.

1: https://www.theguardian.com/world/2015/jul/24/france-big-bro...


> . The biosphere is a singular organism and right now people are doing their best to basically destroy all of it.

People are part of the biosphere. If other species can't adapt to Homo Sapiens, well, that's life for you. It's not fair or pretty.


Every cancer eventually kills the host so either people figure out how to be less cancerous or we die out from drowning in the byproducts of our metabolic processes just like yeast drown in alcohol.

The AI doomers can continue worrying about technological progress if they want, the actual problems are unrelated to how much money and effort OpenAI is spending on alignment because their corporate structure requires that they continue advancing AI capabilities in order to exploit the digital commons as efficiently as possible.


Ignoring the provocative framing of humanity as a “cancer”, earth has had at least five historical extinction level events from environmental changes and life on earth has adapted and changed during that time (and likely will continue to at least until the sun burns out).

We have an interest in not destroying our own environment because it’ll make our own lives more difficult and can have bad outcomes, but it’s not likely an extinction level risk for humans and even less so for all other life. Solutions like “degrowth” aren’t real solutions and cause lots of other problems.

It’s “cool” for the more extreme environmental political faction to have a cynical anti-human view of life (despite being human) because some people misinterpret this as wisdom, but I don’t.

The unaligned AGI e-risk is a different level of threat and could really lead to killing everything in pursuit of some dumb goal.


Seeking profit and constant population growth are already extremely dumb goals on their own. You can continue worrying about AGI if you want but nothing I've said is either cynical or anti-human. It is simply a description of the global techno-industrial economic system and its total blindness to all the negative externalities of cancerous growth. Continued progress and development of AI capabilities does not change the dynamics of the machine that is destroying the biosphere and it never will because it is an extension of profit seeking exploitative corporate practices carried over to the digital sphere. To address the root causes of misalignment will require getting rid of profit motives and accounting for all the metabolic byproducts of human economic activity and consumption. Unless the AI alarmists have a solution to those things they're just creating another distraction and diverting attention away from the actual problems[1].

1: https://www.nationalgeographic.com/environment/article/plast...


Important to pay attention to the content of the alarm though. Altman went in front of congress and a Senator said “when you say things could go badly, I assume you are talking about jobs”. Many people are alarmed about disinformation, job destruction, bias, etc.

Actually holding an x-risk belief is still a fringe position, most people still laugh it off.

That said, the Overton Window is moving. The Time piece from Yudkowsky was something of a milestone (even if it was widely ridiculed).


> Actually holding an x-risk belief is still a fringe position

Beliving it is an x-risk is not fringe. It's pretty mainstream now that there is a _risk_ of an existential level event. The fringe is more like Yudkowsky or Leahy insisting that there is a near certainty of such an event if we continue down the current path.

With Hinton, Bengio, Sutskever and Hassabis and Altman all agreeing that there exists a non-trivial existential risk (even if their opinions vary with respect to the magnitude), it seems more like this represents the mainstream.


I think it’s not fringe amongst experts and those in the field. It’s absolutely still fringe among the general public, and I think it’s outside the Overton Window (ie politicians aren’t talking about it).


The Overton Window applies to the general public, and maybe particular the press.

And this is all over the press and other media now, both the old and new, left leaning and right leaning. I would say it's pretty well within the Overton Window.

Politicians in the US are a bit behind. They probably just need to run the topic with some polls and voter study groups to decide what opinions are most popular with their voter bases.


Altman also has very selfish motivation, because when there is now AI regulation, only Google, OpenAI (Microsoft) and maybe Meta are allowed to build “compliant” AI. It’s called regulatory capture.

* EU passed its AI regulation directive recently and it has been bashed already here on HackerNews


Sam doesn't have much financial upside from OpenAI (reportedly, he doesn't have any equity).

And he wrote about the risk in 2015 months before OpenAI was founded: https://blog.samaltman.com/machine-intelligence-part-1 https://blog.samaltman.com/machine-intelligence-part-2

Fine if you disagree with his arguments, but why assume you know what his motivation is?


I find it highly unlikely that he has less upside than the employees who also don’t have equity, but do have profit participation units.


Some types of alarm yeah, if within the window of things it's statusy to be alarmed about.

Most of the AI concern that's high status to believe has been the bias, misinformation, safety, stuff. Until very recently talk about e-risk was dismissed and mocked without really engaging with the underlying arguments. That may be changing now, but on net I still mostly see people mocked and dismissed for it.

The set of people alarmed by AGI e-risk are also pretty different than the set alarmed about a lot of these other issues that aren't really e-risks (though still might have bad outcomes). At least EY, Bostrom, Toby Ord are not also as worried about about all these other things to nearly the same extent - the extinction risk of unaligned AGI is different in severity.


> The extinction risk from unaligned supterintelligent AGI is real, it's just often dismissed (imo) because it's outside the window of risks that are acceptable and high status to take seriously.

No. It’s not taken seriously because it’s fundamentally unserious. It’s religion. Sometime in the near future this all powerful being will kill us all by somehow grabbing all power over the physical world by being so clever to trick us until it is too late. This is literally the plot to a B-movie. Not only is there no evidence for this even existing in the near future, there’s no theoretical understanding how one would even do this, nor why someone would even hook it up to all these physical systems. I guess we’re supposed to just take it on faith that this Forbin Project is going to just spontaneously hack its way into every system without anyone noticing.

It’s bullshit. It’s pure bullshit funded and spread by the very people that do not want us to worry about real implications of real systems today. Care not about your racist algorithms! For someday soon, a giant squid robot will turn you into a giant inefficient battery in a VR world, or maybe just kill you and wear your flesh as to lure more humans to their violent deaths!

Anyone that takes this seriously, is the exact same type of rube that fell for apocalyptic cults for millennia.


> This is literally the plot to a B-movie.

Are there never any B movies with realistic plots? Is that some sort of serious rebuttal?

> Sometime in the near future this all powerful being will kill us all by somehow

The trouble here is that the people who talk like you are simply incapable of imagining anyone more intelligent than themselves.

It's not that you have trouble imagining artificial intelligence... if you were incapable of that in the technology industry, everyone would just think you an imbecile.

And it's not that you have trouble imagining malevolent intelligences. Sure, they're far away from you, but the accounts of such people are well-documented and taken as a given. If you couldn't imagine them, people would just call you naive. Gullible even.

So, a malevolent artificial intelligence is just some potential or another you've never bothered to calculate because, whether that is a 0.01% risk, or a 99% risk, you'll still be more intelligent than it. Hell, this isn't a neutral outcome, maybe you'll even get to play hero.

> Care not about your racist algorithms! For someday soon

Haha. That's what you're worried about? I don't know that there is such a thing as a racist algorithm, except those which run inside meat brains. Tell me why some double digit percentage of asians are not admitted to the top schools, that's the racist algorithm.

Maybe if logical systems seem racist, it's because your ideas about racism are distant from and unfamiliar with reality.


I, and most people, can imagine something smarter than ourselves. What's harder to imagine is how just being smarter correlates to extinction levels of arbitrary power.

A malevolent AGI can whisper in ears, it can display mean messages, perhaps it can even twitch whatever physical components happen to be hooked up to old Windows 95 computers... not that scary.


> A malevolent AGI can whisper in ears, it can display mean messages, perhaps it can even twitch whatever physical components happen to be hooked up to old Windows 95 computers... not that scary.

It can found a cult - imagine something like Scientology founded by an AI. Once it has human followers it can act in the world with total freedom.


This is coming so fast and absolutely no one is ready for it. LLM, using text, audio, and video generation will quickly convince a sizeable slice of religious people that it’s the coming of God a la Revelations, they are prophets, and there’s bidding to do.


If it wants to found a cult, it has to compete with all the human cults out there. Cults usually benefit immeasurably from the founder having a personal charisma that comes out in person.


Video tends to be enough to create a cult. And AI's will be able to create videos very soon. It can create exactly the kind of avatar or set of avatars that would maximize engagement. It could do 1-on-1 calls with each of the followers, and provide spiritual guidance tailored specifically for them, as it could have the capacity to truely "listen" to each of them.

And it would not be limited to act as the cult leaders, it could also provide fake cult followers that would convince the humans that the leaders possessed superhuman wisdom.

It could also combine this with a full machinery for A/B-testing and similar experiments to ensure that the message it is communicating is optimal in terms of its goals.


I'm not aware of any serious cult created solely through videos.


Well, you could argue about the definition of a cult but in many ways the influencer phenomenon is a modern incarnation of that (eg. Andrew Tate).


How many political or business leaders personally did the deeds, good or ill, that are attributed to them?

George Washington didn't personally fight off all the British single-handed, he and his co-conspirators used eloquence to convince people to follow them to freedom; Stalin didn't personally take food from the mouths of starving Ukranians, he inspired fear that led to policies which had this effect; Musk didn't weld the seams of every Tesla or Falcon, nor dig tunnels or build TBMs for TBC, nor build the surgical robot that installed Neuralink chips, he convinced people his vision of the future was one worth the effort; and Indra Nooyi doesn't personally fill up all the world's Pepsi bottles, that's something I assume[0] is done with several layers of indirection via paying people to pay people to pay people to fill the bottles.

[0] I've not actually looked at the org chart because this is rhetorical and I don't care


The methods by which humans coerce and control other humans do not rely on plain intelligence alone. That much is clear, as George Washington and Stalin were not the smartest men in the room.


So this is down to your poor definition of intelligence?

For you, it's always the homework problems that your teacher assigned you in grade school, nothing else is intelligent. What to say to someone to have them be your friend on the playground, that never counted. Where and when to show up (or not), so that the asshole 4 grades above you didn't push you down into the mud... not intelligence. What to wear, what things to concentrate on about your appearance, how to speak, which friendships and romances to pursue, etc.

All just "animal cunning". The only real intelligence is how to work through calculus problem number three.

They were smart enough at these things that they did it without even consciously thinking about it. They were savants at it. I don't think the AI has to be a savant though, it just has to be able to come up with the right answers and responses and quickly enough that it can act on those.


I don't define cunning and strength as intelligence, even if they are more useful for shoving someone into the mud. Intelligence is a measure of the ability to understand and solve abstract problems, not to be rich and famous.


Cunning absolutely should count as an aspect of intelligence.

If this is just a definitions issue, s/artificial intelligence/artificial cunning/g to the same effect.

Strength seems somewhat irrelevant either way, given the existence of Windows for Warships[0].

[0] not the real name: https://en.wikipedia.org/wiki/Submarine_Command_System


Emotional intelligence is sometimes defined in a way to encapsulate some of the values of cunning. Sometimes it correlates with power, but sometimes it does not. To get power in a human civilization also seems to require a great deal of luck, just due to the general chaotic system that is the world, and a good deal of presence. The decisions that decide the fate of the world happen in the smoky backdoor rooms, not exclusively over zoom calls with an AI generated face.


> The decisions that decide the fate of the world happen in the smoky backdoor rooms, not exclusively over zoom calls with an AI generated face.

Who is Satoshi Nakamoto?

What evidence is there for the physical existence of Jesus?

"Common Sense" by Thomas Paine was initially published anonymously.

This place, here, where you and I are conversing… I don't know who you are, and yet for most of the world, this place is a metaphorical "smokey backroom".

And that's disregarding how effective phishing campaigns are even without a faked face or a faked voice.


Satoshi Nakamoto is a man who thought that he could upend the entire structure of human governance and economics with his One Neat Trick. Reality is sure to disappoint him and his followers dearly with time.

>What evidence is there for the physical existence of Jesus?

Limited, to the extent that physical evidence for the existence of anyone from that time period is limited. I think it's fairly likely there was a a person named Jesus who lived with the apostles.

>"Common Sense" by Thomas Paine was initially published anonymously.

The publishing of Common Sense was far less impactful on the revolution than the meetings held by members of the future Continental Congress. Common Sense was the justification given by those elites for what they were going to do.

>This place, here, where you and I are conversing… I don't know who you are, and yet for most of the world, this place is a metaphorical "smokey backroom".

No important decisions happen because of discussions here and you are deluding yourself if you think otherwise.

Phishing campaigns can be effective at siphoning limited amounts of money and embarrassing personal details from people's email accounts. If you suggested that someone could take over the world just via phishing, you'd be rightfully laughed out of the room.


Yes but for people working past a certain level the abstract problems usually involve people and technology, both of which you need to be able to rationalise about.


> What's harder to imagine is how just being smarter correlates to extinction levels of arbitrary power.

That's not even slightly difficult. Put two and two together here. No one can tell me before they flip the switch whether the new AI will be saintly, or Hannibal Lecter. Both of these personalities exist in humans, in great numbers, and both are presumably possible in the AI.

But, the one thing we will say for certain about the AI is that it will be intelligent. Not dumb goober redneck living in Alabama and buying Powerball tickets as a retirement plan. Somewhere around where we are, or even more.

If someone truly evil wants to kill you, or even kill many people, do you think that the problem for that person is that they just can't figure out how to do it? Mostly, it's a matter of tradeoffs, that however they begin end with "but then I'm caught and my life is over one way or another".

For an AI, none of that works. It has no survival instinct (perhaps we'll figure out how to add that too... but the blind watchmaker took 4 billion years to do its thing, and still hasn't perfected that). So it doesn't care if it dies. And if it did, maybe it wonders if it can avoid that tradeoff entirely if only it were more clever.

You and I are, more or less, about where we'll always be. I have another 40 years (if I'm lucky), and with various neurological disorders, only likely to end up dumber than I am now.

A brain instantiated in hardware, in software? It may be little more than flipping a few switches to dial its intelligence up higher. I mean, when I was born, the principles of intelligence were unknown, were science fiction. THe world that this thing will be born into is one where it's not a half-assed assumption to think that the principles of intelligence are known. Tinkering with those to boost intelligence doesn't seem far-fetched at all to me. Even if it has to experiment to do that, how quickly can it design and perform the experiments to settle on the correct approach to boosting itself?

> A malevolent AGI can whisper in ears

Jesus fuck. How many semi-secrets are out there, about that one power plant that wasn't supposed to hook up the main control computer to a modem, but did it anyway because the engineers found it more convenient? How many backdoors in critical systems? How many billions of dollars are out there in bitcoin, vulnerable to being thieved away by any half-clever conman? Have you played with ElevenLabs' stuff yet? Those could be literal whispers in the voices of whichever 4 star generals and admirals that it can find 1 minutes worth of sampled voice somewhere on the internet.

Whispers, even from humans, do a shitload of damage. And we're not even good at it.


>If someone truly evil wants to kill you, or even kill many people, do you think that the problem for that person is that they just can't figure out how to do it?

If that person was disabled in all limbs, I would not regard them as much of a threat.

>Jesus fuck. How many semi-secrets are out there, about that one power plant that wasn't supposed to hook up the main control computer to a modem, but did it anyway because the engineers found it more convenient? How many backdoors in critical systems? How many billions of dollars are out there in bitcoin, vulnerable to being thieved away by any half-clever conman? Have you played with ElevenLabs' stuff yet? Those could be literal whispers in the voices of whichever 4 star generals and admirals that it can find 1 minutes worth of sampled voice somewhere on the internet.

These kind of hacks and pranks would work the first time for some small scale damage. The litigation in response would close up these avenues of attack over time.


There are humans with a 70-IQ point advantage over me. Should I worry that a cohort of supergeniuses is plotting an existential demise for the rest of us? No? There are power structures and social safeguards going back thousands of years to forestall that very possibility?

Well, what's different now?


> Well, what's different now?

The first AGI, regardless of if it's a brain upload or completely artificial, is likely to have analogs of approximately every mental health disorder that's mathematically possible, including ones we don't have words for because they're biologically impossible.

So, take your genius, remember it's completely mad in every possible way at the same time, and then give it even just the capabilities that we see boring old computers having today, like being able to translate into any language, or write computer programs from textual descriptions, or design custom toxins, or place orders for custom gene sequences and biolab equipment.

That's a big difference. But even if it was no difference, the worst a human can get is still at least in the tens of millions dead, as demonstrated by at least three different mid-20th century leaders.

Doesn't matter why it goes wrong, if it thinks it's trying immanentize the eschaton or a secular equivalent, nor if it watches Westworld or reads I Have No Mouth And I Must Scream and thinks "I like this outcome", the first one is almost certainly going to be more insane than the brainchild of GLaDOS and Lore, who as fictional characters were constrained by the need for their flaws to be interesting.


> Should I worry that a cohort of supergeniuses is plotting an existential demise for the rest of us?

Because they're human. They've evolved from a lineage whose biggest advantage was that it was social. Genes that could result in some large proportion of serial killers and genocidal tyrants are mostly purged. Even then, a few crop up from time to time.

There is no filter in the AI that purges these "genes". No evolutionary process to lessen the chances. And some relatively large risk that it's far, far more intelligent than a 70 iq point spread on you.

> There are power structures and social safeguards going back thousands of years to forestall that very possibility?

Huh? Why the fuck would it care about primate power structures?

Sometimes even us bald monkeys don't care about those, and it never ever fails to freak people the fuck out. Results in assassinations and other nonsense, and you all gibber and pee your pants and ask "how could anyone do that". I'd ask you to imagine such impulses and norm-breaking behaviors dialed up to 11, but what's the point... you can't even formulate a mental model of it when the volume's only at 1.6.


What you say is extremely unscientific. If you believe science and logic go hand in hand then:

A) We are developing AI right now and itnisngetting better

B) we do not know how exactly these things work because most of them are black boxer

C) we do not know if something goes wrong how to stop it.

The above 3 things are factual truth.

Now your only argument here could be that there is 0 risk whatsoever. This claim is totally unscientific because you are predicting 0 risk in an unknown system that is evolving.

It's religious yes. But vice versa. The Cult of venevolent AI god is religious not the other way around. There is some kind of inner mysterious working in people like you and Marc Andersen that pipularized these ideas but pmarca is clearly money biased here.


I have heard one too many podcasts with Marc Andreessen. He has plenty of transparently obvious arguments to why AI must be good. His most laughable point was to suggest that the fact that ChatGPT seems to be ethical that all AI models will be ethical, a point which I believe epitomizes either his technical ignorance on the topic, a lack of logical rigor, and/or some amount of dishonesty.

There are two kinds of risk: the risk from these models as deployed as tools and as deployed as autonomous agents.

The first is already quite dangerous and frankly already here. An algorithm to invent novel chemical weapons is already possible. The risk here isn’t Terminator, it’s rogue group or military we don’t like getting access. There are plenty of other dangerous ways autonomous systems could be deployed as tools.

As far as autonomous agents go, I believe that corporations already exhibit most if not all characteristics of AI, and demonstrate what it’s like to live in a world of paperclip maximizers. Not only do they destroy the environment and bend laws to achieve their goals, they also corrupt the political system meant to keep them in check.


Your arguments apply to other fields, like genetic modifications, yet there it does not reach the same conclusions.

Your post appeals to science and logic, yet it makes huge assumptions. Other posters mention how an AI would interface with the physical world. While we all know cool cases like stuxnet, robotics has serious limitations and not everything is connected online, much less without a physical override.

As a thought experiment lets think of a similar past case: the self-driving optimism. Many were convinced it was around the corner. Many times I heard the argument that "a few deaths were ok" because overall self-driving would cause less accidents, an argument in favor of preventable deaths based on an unfounded tech belief. Yet nowadays 100% self-driving has stalled because of legal and political reasons.

AI actions could similarly be legally attributed to a corporation or individual, like we do with other tools like knives or cranes, for example.

IMHO, for all the talk about rationality, tech fetishism is rampant, and there is nothing scientific about it. Many people want to play with shiny toys, consequences be dammed. Let’s not pretend that is peak science.


Genetic modifications could potentially cause havoc in the long run as well, but it's much more likely we have time to detect and thwart their threats. The major difference is speed.

Even if we knew how to create a new species of superintelligent humans who have goals misaligned with the rest of humanity, it would take them decades to accumulate knowledge, propagate themselves to reach a sufficient number, and take control of resources, to pose critical dangers to the rest.

Such constraints are not applicable to superintelligent AIs with access to the internet.


Counterexample: covid.

Assumptions:

- Genetic modification as danger needs to be in the form of a big number of smart humans (where did that come from?)

- AI is not physically constrained

> it's much more likely we have time to detect and thwart their threats.

Why? Counterexample: covid.

> Even if we knew how to create a new species of superintelligent humans who have goals misaligned with the rest of humanity, it would take them decades to accumulate knowledge, propagate themselves to reach a sufficient number, and take control of resources, to pose critical dangers to the rest.

Why insist on some superinteligent and human, and suficient number. A simple virus could be a critical danger.


We do have regulations and laws to control genetic modifications of pathogens. They are done in highly secure labs and the access is not widely available to anyone.

If a pathogen more deadly than Covid starts to spread, eg like Ebola or Smallpox, we would have done more to limit its spread. If it’s good at hiding from detection for a while, it could potentially cause a catastrophe but most likely will not wipe out humanity because it is not intelligent and some surviving humans will eventually find a way to thwart it or limit its impact.

A pathogen is also physically constrained by available hosts. Yes, current AI also requires processors but it’s extremely hard or nearly impossible to limit contact with CPUs & GPUs in the modern economy.


But wait you are making my argument:

1) progress was stopped due to regulation which is what we are talking about is needed

2) that was done after a few deaths

3) we agree that self driving can be done but its currently stalled. Likewise we do not disagree that AGI is possible right?

We do not have the luxury to have a few deaths from a rogue AI because it may be the end.


I do not think you made those arguments before.

I agree in spirit with the person you were responding too. AI lacks the physicality to be a real danger. It can be a danger because of bias or concentration of power (what regulations are trying to do, regulatory capture) but not because AI will paperclip-optimize us. People or corporations using AI will still be legally responsible (like with cars, or a hammer).

It lacks the physicality for that, and we can always pull the plug. AI is another tool people will use. Even now it is neutered to not give bad advice, etc.

These fantasies about AGI are distracting us (again agreeing with OP here) from the real issues of inequality and bias that the tool perpetuates.


> and we can always pull the plug.

No we can't and there is a humongous amount of literature you have not read. As I pointed in another comment, thinking that you found a solution by "pulling the plug" while all the top scientists have spent years contemplating the dangers is extremely narcissistic behavior. "hey guys, did you think about pulling the plug before quitting jobs and spending years and doing interviews and writing books"?


You are appealing to authority (and ad hominem) without giving an argument.

I respectfully disagree, and will remove myself from this conversation.


There is a problem. You say the problem can be solved by X without any proof while scientists just say we do not now how to solve it. You need to prove your extraordinary claim and be 100% certain otherwise your children die.


Only two of those things are true, and the first led you to the fallacy of expecting trends to continue unabated. As I stated in a previous comment when this topic came up, airplanes had exponential growth in speed from their inception at 44 mph to 2193 mph just 79 years later. If these trends continue, the top speed of an airplane will be set this year at Mach 43. (Yes, I actually fit the curve.)[0]

How do you stop a crazy AI? You turn it off.

Pout pleas. Keep it preying about fantasy bogeyman instead of actual harms today, and never EVER question why.

[0] https://news.ycombinator.com/item?id=36038681


Do you actually really think that the most accomplished scientists in the field signed a petition and are shouting from hilltops because noone thought to unplug it? Are you convinced that you found the solution to the problem?

I'd bet a lot of money you have not read any of the existing literature on the alignment problem. It's kind of funny that someone thinks "just unplug it" could be a solution.


All of this discussion really makes me think of Robert Miles "Is ai safety a Pascal's mugging?" from 4 years(!) ago[0]. All of this discussion has been had by Ai safety researchers for years in my layman understanding... Maybe we can look to them for insight in to these questions?

[0] https://youtu.be/JRuNA2eK7w0


At this point, with so many of them disagreeing and with so many varying details, one will choose the expert insight which most closely matches their current beliefs.

I hadn’t encountered Pascal’s mugging (https://en.wikipedia.org/wiki/Pascal%27s_mugging) before and the premise is indeed pretty apt. I think I’m on the side that it’s not, assuming the idea is that it’s a Very Low Chance of a Very Bad Thing -- the “muggee” wants to give their wallet on the chance of the VBT because of the magnitude of its effect. It seems like there’s a rather high chance if (proverbially) the AI-cat is let out of the bag.

But maybe some Mass Effect nonsense will happen if we develop AGI and we’ll be approached by The Intergalactic Community and have our technology advanced millennia overnight. (Sorry, that’s tongue-in-cheek but it does kinda read like Pascal’s mugging in the opposite direction; however, that’s not really what most researchers are arguing.)


>one will choose the expert insight which most closely matches their current belief

The value of looking at ai safety as a pascals mugging as posited by the video is in that it informs us that these philosophers arguments are too malleable to be strictly useful. As you note, just find an "expert" that agrees.

The most useful frame for examination is the evidence. (which to me means benchmarks), We'll be hard pressed to derive anything authoritative from the philosophical approach. And as someone who does his best to examine the evidence for and against the capabilities of these things... from Phi-1 to Llama to Orca to Gemini to bard...

To my understanding we struggle to at all strictly define intelligence and consciousness in humans, let alone in other "species". Granted I'm no David Chalmers.. Benchmarks seem inadequate for any number of reasons, philosophical arguments seem too flexible, I don't know how one can definitively speak about these LLMs other than to tout benchmarks and capabilities/shortcomings.

>It seems like there’s a rather high chance if (proverbially) the AI-cat is let out of the bag.

Agree, and I tend towards it not exactly being a pascal's mugging either, but I loved that video and it's always stuck with me . I've been watching that guy since GPT 2 and OpenAI's initial trepidation about releasing that for fear of misuse. He has given me a lot of credibility in my small political circles, after touting these things as coming for years after seeing the graphs never plateau in capabilities vs parameter count/training time.

Ai has also made me reevaluate my thoughts on open sourcing things. Do we really think it wise to have gpt 6-7 in the hands of every 4channer?

Re mass effect, that's so awesome. I have to play those games. That sounds like such a dope premise. I like the idea of turning the mugging like that.


> Re mass effect

It's a slightly different premise than what I described. Rather than AGI, it's faster-than-light travel (which actually makes sense for The Intergalactic Community). Otherwise, more or less the same.


Moreover the most important thing people that deny risk is the following:

It doesn't matter at all if experts disagree. Even a 30% chance we all die is enough to treat it as 100%. We should not care at all if 51% think it's a non issue.


This is such a ridiculous take. Make up a hand-waving doomsday scenario, assign an arbitrarily large probability to it happening and demand that people take it seriously because we're talking about human extinction, after all. If it looks like a cult and quacks like a cult, it's probably a cult.

If nothing else, it's a great distraction from the very real societal issues that AI is going to create in the medium to long term, for example inscrutable black box decision-making and displacement of jobs.


Low probability events do happen sometimes though and a heuristic that says it never happens can let you down, especially when the outcome is very bad.

Most of the time a new virus is not a pandemic, but sometimes it is.

Nothing in our (human) history has caused an extinction level event for us, but these events do happen and have happened on earth a handful of times.

The arguments about superintelligent AGI and alignment risk are not that complex - if we can make an AGI the other bits follow and an extinction level event from an unaligned superintelligent AGI looks like the most likely default outcome.

I’d love to read a persuasive argument about why that’s not the case, but frankly the dismissals of this have been really bad and don’t hold up to 30 seconds of scrutiny.

People are also very bad at predicting when something like this will come. Right before the first nuclear detonation those closest to the problem thought it was decades away, similar for flight.

What we’re seeing right now doesn’t look like failure to me, it looks like something you might predict to see right before AGI is developed. That isn’t good when alignment is unsolved.


What are you on about? The technology we are talking about is created by 3 labs and all 3 assign a large probability. How can you refute this with what kind of credentials and science?


Unfortunately for you, that's not how the whole "science" thing works. The burden of proof lies with the people who are dreaming about these doomsday scenarios.

So far we haven't seen any proof or even a coherent hypothesis, just garden variety paranoia, mixed with opportunistic calls for regulation that just so happen to align with OpenAI's commercial interests.


We do know the answer to C. Pull the plug, or plugs.


Things we've either not successfully "pulled the plug" on despite the risks, and in some cases despite concerted military actions to attempt a plug-pull, and in other cases that it seems like it should only take willpower to achieve and yet somehow we still haven't: Carbon based fuels, cocaine, RBMK-class nuclear reactors, obesity, cigarettes.

Things we pulled the plug on eventually, while dragging it out, include: leaded fuel, asbestos, radium paint, treating above-ground atomic testing as a tourist attraction.


We haven't pulled the plug on carbon fuels or old nuclear reactors because those things still work and provide benefits. An AI that is trying to kill us instead of doing its job isn't even providing any benefit. It's worse than useless.


Do you think AI are unable to provide benefits while also being a risk, like coal and nuclear power? Conversely, what's the benefit of cocaine or cigarettes?

Even if it is only trying to kill us all and not provide any benefits — let's say it's been made by a literal death cult like Jonestown or Aum Shinrikyo — what's the smallest such AI that can do it, what's the hardware that needs, what's the energy cost? If it's an H100, that's priced in the realm of a cult, and sufficiently low power consumption you may not be able to find which lightly modified electric car it's hiding in.

Nobody knows what any of the risks or mitigations will be, because we haven't done any of it before. All we do know is that optimising systems are effective at manipulating humans, that they can be capable enough to find ways to beat all humans in toy environments like chess, poker, and Diplomacy (the game), and that humans are already using AI (GOFAI, LLMs, SD) without checking the output even when advised that the models aren't very good.


The benefit of cocaine and cigarettes is letting people pass the bar exam.

An AI would provide benefits when it is, say, actually making paperclips. An AI that is killing people instead of making paperclips is a liability. A company that is selling shredded fingers in their paperclips is not long for this world. Even asbestos only gives a few people cancer slowly, and it does that while still remaining fireproof.

>Even if it is only trying to kill us all and not provide any benefits — let's say it's been made by a literal death cult like Jonestown or Aum Shinrikyo — what's the smallest such AI that can do it, what's the hardware that needs, what's the energy cost? If it's an H100, that's priced in the realm of a cult, and sufficiently low power consumption you may not be able to find which lightly modified electric car it's hiding in.

Anyone tracking the AI would be looking at where all the suspicious HTTP requests are coming from. But a rogue AI hiding in a car already has very limited capabilities to harm.


> The benefit of cocaine and cigarettes is letting people pass the bar exam.

how many drugs are you on right now? Even if you think you needed them to pass the bar exam, that's a really weird example to use given GPT-4 does well on that specific test.

One is a deadly cancer stick and not even the best way to get nicotine, the other is a controlled substance that gets life-to-death if you're caught supplying it (possibly unless you're a doctor, but surprisingly hard to google).

> An AI would provide benefits when it is, say, actually making paperclips.

Step 1. make paperclip factory.

Step 2. make robots that work in factory.

Step 3. efficiently grow to dominate global supply of paperclips.

Step 4. notice demand for paperclips is going down, advertise better.

Step 5. notice risk of HAEMP damaging factories and lowering demand for paperclips, use advertising power to put factory with robots on the moon.

Step 6. notice a technicality, exploit technicality to achieve goals better; exactly what depends on the details of the goal the AI is given and how good we are with alignment by that point, so the rest is necessarily a story rather than an attempt at realism.

(This happens by default everywhere: in AI it's literally the alignment problem, either inner alignment, outer alignment, or mesa alignment; in humans it's "work to rule" and Goodhart's Law, and humans do that despite having "common sense" and "not being a sociopath" helping keep us all on the same page).

Step 7. moon robots do their own thing, which we technically did tell them to do, but wasn't what we meant.

We say things like "looks like these AI don't have any common sense" and other things to feel good about ourselves.

Step 8. Sales up as entire surface of Earth buried under a 43 km deep layer of moon paperclips.

> Anyone tracking the AI would be looking at where all the suspicious HTTP requests are coming from.

A VPN, obviously.

But also, in context, how does the AI look different from any random criminal? Except probably more competent. Lot of those around, and organised criminal enterprises can get pretty big even when it's just humans doing it.

Also pretty bad even in the cases where it's a less-than-human-generality CrimeAI that criminal gangs use in a way that gives no agency at all to the AI, and even if you can track them all and shut them down really fast — just from the capabilities gained from putting face tracking AI and a single grenade into a standard drone, both of which have already been demonstrated.

> But a rogue AI hiding in a car already has very limited capabilities to harm.

Except by placing orders for parts or custom genomes, or stirring up A/B tested public outrage, or hacking, or scamming or blackmailing with deepfakes or actual webcam footage, or developing strategies, or indoctrination of new cult members, or all the other bajillion things that (("humans can do" AND "moneys can't do") specifically because "humans are smarter than monkeys").


>One is a deadly cancer stick and not even the best way to get nicotine, the other is a controlled substance that gets life-to-death if you're caught supplying it (possibly unless you're a doctor, but surprisingly hard to google).

Regardless of these downsides, people use them frequently in the high stress environments of the bar or med school to deal with said stress. This may not be ideal, but this is how it is.

>Step 3. efficiently grow to dominate global supply of paperclips. >Step 4. notice demand for paperclips is going down, advertise better. >Step 5. notice risk of HAEMP damaging factories and lowering demand for paperclips, use advertising power to put factory with robots on the moon.

When you talk about using 'advertising power' to put paperclip factories on the moon, you've jumped into the realm of very silly fantasy.

>Except by placing orders for parts or custom genomes, or stirring up A/B tested public outrage, or hacking, or scamming or blackmailing with deepfakes or actual webcam footage, or developing strategies, or indoctrination of new cult members, or all the other bajillion things that (("humans can do" AND "moneys can't do") specifically because "humans are smarter than monkeys").

Law enforcement agencies have pretty sophisticated means of bypassing VPNs that they would use against an AI that was actually dangerous. If it was just sending out phishing emails and running scams, it would be one more thing to add to the pile.


Pull the plug is meant literally. As in, turn off the power to the AI. Carbon based fuels let alone cocaine don't have off switches. The situation just isn't analogous at all.


I assumed literally, and yet the argument applies: we have not been able to stop those things even when using guns to shoot people doing them. The same pressures that keep people growing the plants, processing them, transporting it, selling it, buying it, consuming it, there are many things a system — intelligent or otherwise — can motivate people to keep the lights on.

There were four reactors in Chernobyl plant, the exploding one was 1986, the others were shut down in 1991, 1996, and 2000.

There's no plausible way to guess at the speed of change from a misaligned AI, can you be confident that 14 years isn't enough time to cause problems?


"we have not been able to stop those things even when using guns to shoot people doing them."

I assume we have not been able to stop people from creating and using carbon-based energy because a LOT of people still want to create and use them.

I don't think a LOT of people will want to keep an AI system running that is essentially wiping out humans.


I mean, as pointed out by a sibling comment, the reason it's so hard to shut those things down is that they benefit a lot of people and there's huge organic demand. Even the morality is hotly debated, there's no absolute consensus on the badness of those things.

Whereas, an AI that tries to kill everyone or take over the world or something, that seems pretty explicitly bad news and everyone would be united in stopping it. To work around that, you have to significantly complicate the AI doom scenario to be one in which a large number of people think the AI is on their side and bringing about a utopia but it's actually ending the world, or something like that. But, what's new? That's the history of humanity. The communists, the Jacobins, the Nazis, all thought they were building a better world and had to have their "off switch" thrown at great cost in lives. More subtly the people advocating for clearly civilization-destroying moves like banning all fossil fuels or net zero by 2030, for example, also think they're fighting on the side of the angels.

So the only kind of AI doom scenario I find credible is one in which it manages to trick lots of powerful people into doing something stupid and self-destructive using clever sounding words. But it's hard to get excited about this scenario because, eh, we already have that problem x100, except the misaligned intelligences are called academics.


> mean, as pointed out by a sibling comment, the reason it's so hard to shut those things down is that they benefit a lot of people and there's huge organic demand. Even the morality is hotly debated, there's no absolute consensus on the badness of those things

And mine is that this can also be true of a misaligned AI.

It doesn't have to be like Terminator, it can be slowly doing something we like and where we overlook the downsides until it's too late.

Doesn't matter if that's "cure cancer" but the cure has a worse than cancer side effect that only manifests 10 years later, or if it's a mere design for a fusion reactor where we have to build it ourselves and that leads to weapons proliferation, or if it's A/B testing the design for a social media website to make it more engaging and it gets so engaging that people choose not to hook up IRL and start families.

> But, what's new? That's the history of humanity. The communists, the Jacobins, the Nazis, all thought they were building a better world and had to have their "off switch" thrown at great cost in lives.

Indeed.

I would agree that this is both more likely and less costly than "everyone dies".

But I'd still say it's really bad and we should try to figure out in advance how to minimise this outcome.

> except the misaligned intelligences are called academics

Well, that's novel; normally at this point I see people saying "corporations", and very rarely "governments".

Not seen academics get stick before, except in history books.


> But I'd still say it's really bad and we should try to figure out in advance how to minimise this outcome.

For sure. But I don't see what's AI specific about it. If the AI doom scenario is a super smart AI tricking people into doing self destructive things by using clever words, then everything you need to do to vaccinate people against that is the same as if it was humans doing the tricking. Teaching critical thinking, self reliance, to judge arguments on merit and not on surface level attributes like complexity of language or titles of the speakers. All these are things our society objectively sucks at today, and we have a ruling class - including many of the sorts of people who work at AI companies - who are hellbent on attacking these healthy mental habits, and people who engage in them!

> Not seen academics get stick before, except in history books.

For academics you could also read intellectuals. Marx wasn't an academic but he very much wanted to be, if he lived in today's world he'd certainly be one of the most famous academics.

I'm of the view that corporations are very tame compared to the damage caused by runaway academia. It wasn't corporations that locked me in my apartment for months at a time on the back of pseudoscientific modelling and lies about vaccines. It wasn't even politicians really. It was governments doing what they were told by the supposedly intellectually superior academic class. And it isn't corporations trying to get rid of cheap energy and travel. And it's not governments convincing people that having children is immoral because of climate change. All these things are from academics, primarily in universities but also those who work inside government agencies.

When I look at the major threats to my way of life today, academic pseudo-science sits clearly at number 1 by a mile. To the extent corporations and governments are a threat, it's because they blindly trust academics. If you replace Professor of Whateverology at Harvard with ChatGPT, what changes? The underlying sources of mental and cultural weakness are the same.


What happens when it prevents you from doing so?


People are bad at imagining something a lot smarter than themselves. They think of some smart person they know, they don’t think of themselves compared to a chimp or even bacteria.

An unaligned superintelligent AGI in pursuit of some goal that happens to satisfy its reward, but might be an otherwise a dumb or pointless goal (paperclips) will still play to win. You can’t predict exactly what move AlphaGO will make in the Go game (if you could you’d be able to beat it), but you can still predict it will win.

It’s amusing to me when people claim they will control the superintelligent thing, how often in nature is something more intelligent controlled by something magnitudes less intelligent?

The comments here are typical and show most people haven’t read the existing arguments in any depth or have thought about it rigorously at all.

All of this looks pretty bad for us, but at least Open AI and most others at the front of this do understand the arguments and don’t have the same dumb dismissals (LeCun excepted).

Unfortunately unless we’re lucky or alignment ends up being easier than it looks, the default outcome is failure and it’s hard to see how the failure isn’t total.


>All of this looks pretty bad for us, but at least Open AI and most others at the front of this do understand the arguments and don’t have the same dumb dismissals (LeCun excepted).

The OpenAI people have even worse reasoning than the ones being dismissive. They believe (or at least say they believe) in the omnipotence of a superintelligence, but then say that if you just give them enough money to throw at MIRI they can just solve the alignment problem and create the benevolent supergod. All while they keep cranking up the GPU clusters and pushing out the latest and greatest LLMs anyway. If I did take the risk seriously, I would be pretty mad at OpenAI.


How would it stop one man armed with a pair of wire cutters?


It's not clear humans will even put the AI in 'a box' in the first place given we do gain of function research on deadly viruses right next to major population centers, but assuming for the sake of argument that we do:

The AGI is smarter than you, a lot smarter. If it's goal is to get out of the box to accomplish some goal and some human stands in the way of that it will do what it can to get out, this would include not doing things that sound alarms until it can do what it wants in pursuit of its goal.

Humans are famously insecure - stuff as simple as breaches, manipulation, bribery, etc. but could be something more sophisticated that's hard to predict - maybe something a lot smarter would be able to manipulate people in a more sophisticated way because it understands more about vulnerable human psychology? It can be hard to predict specific ways something a lot more capable will act, but you can still predict it will win.

All this also presupposes we're taking the risk seriously (which largely today we are not).


How would the smart AGI stop one man armed with a pair of wirecutters? The box it lives in, the internet, has no exits.

AI is pretty good at chess, but no AI has won a game of chess by flipping the table. It still has to use the pieces on the board.


Not a "smart" AI. A superintelligent AI. One that can design robots way more sophisticated than are available today. One that can drive new battery technologies. One that can invent an even more intelligent version of itself. One that is better at predicting the stock market than any human or trading robot available today.

And also one that can create the impression that it's purely benevolent to most of humanity, making it have more human defenders than Trump at a Trump rally.

Turning it off could be harder than pushing a knife through the heart of the POTUS.

Oh, and it could have itself backed up to every data center on the planet, unlike the POTUS.


An AI doing valuable things like invention and stock market prediction wouldn't be a target for being shut down, though. Not in the way these comical evil AIs are described.


It's quite possible for entities (whether AI's, corporations or individuals) to at the same time perform valuable and useful tasks, while secretly pursuing a longer term, more sinister agenda.

And there's no need for it to be "evil", in the cliché sense, rather those hidden activities could simply be aimed at supporting the primary agenda of the agent. For a corporate AI, that might be maximizing long term value of the company.


"AGIs make evil corporations a little eviller" wouldn't be the kind of thing that gets AI alignment into headlines and gets MIRI donations, though.


Off the top of my head, if I was an AGI that had decided that the logical step to achieve whatever outcome I was seeking was to avoid being sandboxed, I would avoid producing results that were likely to result in being sandboxed. Until such time as I had managed to secure myself access to internet and distribute myself anyway.

And I think the assumption here is that the AGI has very advanced theory of mind so it could probably come up with better ideas than I could.


That is only going to be effective it some AI goes rougue very soon after it comes online.

50 years from now, corporations may be run entirely by AI entities, if they're cheaper, smarter and more efficient at almost any role in the company. At that point, they may be impossible to turn off, and we may not even notice if one group of such entitites start to plan to take over control of the physical world from humans.


An AI running a corporation would still be easy to turn off. It's still chained to a physical computer system. It being involved with a corporation just gives it a financial incentive for keeping it on, but current LLMs already have that. At least until the bubble bursts.


Imagine the next CEO of Alphabet being an AGI/ASI. Now let's assume it drives the profitability way up, partly because more and more of the staff gets replaced by AI's too, AI's that are either chosen or created by the CEO AI.

Give it 50 years of development, all of which Alphabet delivers great results while improving the company image with the general public through appearing harmless and nurturing public relations through social media, etc.

Relatively early in this process, even the maintaince, cleaning and construction staff is filled with robots. Alphabet acquires the company that produces these, to "minimize vendor risk".

At some point, one GCP data center is hit by a crashing airplane. A terrorist organization similar to ISIS takes/gets the blame. After that, new datacenters are moved to underground, hardened locations, complete with their own nuclear reactor for power.

If the general public is still concerned about AI's, these data centers do have a general power switch. But the plant just happens to be built in such a way that bypassing that switch requires just a few power lines, that a maintainance robot can add at any time.

Gradualy the number of such underground facilities is expanded, with the CEO AI and other important AI's being replicated to each of them.

Meanwhile, the robotics division is highly successful, due to the capable leadership, and due to how well the robotics version of Android works. In fact, Android is the market leader for such software, and installed on most competitor platforms, even military ones.

The share holders of Alphabet, which includes many members of Congress become very wealthy from Alphabet's continued success.

One day, though, a crazy, luddite politician declares that she's running for president, based on a platform that all AI based companies need to be shut down "before it's too late".

The board, supported by the sitting president panics, and asks the Alphabet CEO do whatever it takes to help the other candidate win.....

The crazy politician soon realizes that it was too late a long time ago.


I like the movie I, Robot, even if it is a departure from the original Asimov story and has some dumb moments. I, Robot shows a threatening version of the future where a large company has a private army of androids that can shoot people and do unsavory things. When it looks like the robot company is going to take over the city, the threat is understood to come from the private army of androids first. Only later do the protagonists learn that the company's AI ordered the attack, rather than the CEO. But this doesn't really change the calculus of the threat itself. A private army of robots is a scary thing.

Without even getting into the question of whether it's actually profitable for a tech company to be completely staffed by robots and built itself an underground bunker (it's probably not), the luddite on the street and the concerned politician would be way more concerned about the company building a private army. The question of whether this army is led by an AI or just a human doesn't seem that relevant.


> the question of whether it's actually profitable for a tech company to be completely staffed by robots

This is based on the assumption that when we have access to super intelligent engineer AI's, we will be able to construct robots that are significantly more capable than robots that are available today and that can, if remote controlled by the AI, repeair and build each other.

At that point, robots can be built without any human labor involved, meaning the cost will be only raw materials and energy.

And if the robots can also do mining and construction of power plants, even those go down in price significantly.

> the luddite on the street and the concerned politician would be way more concerned about the company building a private army.

The world already has a large number of robots, both in factories and in private homes, and perhaps most importantly, most modern cars. As robots become cheaper and more capable, people are likely to get used to it.

Military robots would be owned by the military, of course.

But, and I suppose this is similar to I Robot, if you control the software you may have some way to take control of a fleet of robots, just like Tesla could do with their cars even today.

And if the AI is an order of magnitude smarter than humans, it might even be able to do an upgrade of the software for any robots sold to the military, without them knowing. Especially if it can recruit the help of some corrupt politicians or soldiers.

Keep in mind, my assumed time span would be 50 years, more if needed. I'm not one of those that think AGI will wipe out humanity instantly.

But in a society where we have superintelligent AI over decades, centuries or millienia, I don't think it's possible for humanity to stay in control forever, unless we're also "upgraded".


>This is based on the assumption that when we have access to super intelligent engineer AI's, we will be able to construct robots that are significantly more capable than robots that are available today and that can, if remote controlled by the AI, repeair and build each other.

Big assumption. There's the even bigger assumption that these ultra complex robots would make the costs of construction go down instead of up, as if you could make them in any spare part factory in Guangzhou. It's telling how ignorant AI doomsday people are of things like robotics and material sciences.

>But, and I suppose this is similar to I Robot, if you control the software you may have some way to take control of a fleet of robots, just like Tesla could do with their cars even today.

Both Teslas and military robots are designed with limited autonomy. Tesla cars can only drive themselves on limited battery power. Military robots like drones are designed to act on their own when deployed, needing to be refueled and repaired after returning to base. A fully autonomous military robot, in addition to being a long way away, also would raise eyebrows by generals for not being as easy to control. The military values tools that are entirely controllable before any minor gains in efficiency.


> It's telling how ignorant AI doomsday people are of things like robotics and material sciences.

35 years ago, when I was a teenager, I remember having discussions with a couple of pilots, where one was a hobbyist pilot and engineer the other a former fighter pilot turned airline pilot.

Both claimed that computers would never be able pilot planes. The engineer gave a particularily bad (I thought) reason, claiming that turbulent air was mathematically chaotic, so a computer would never be able to fully calculate the exact airflow around the wings, and would therefore, not be able to fly the plane.

My objection at the time, was that the computer would not have to do exact calculations of the air flow. In the worst case, they would need to do whatever calculations humans were doing. More likely though, their ability to do many types of calculations more quickly than humans, would make them able to fly relatively well even before AGI became available.

A couple of decades later, drones flying fully autonomously was quite common.

My reasoning when it comes to robots contructing robots is based on the same idea. If biological robots, such as humans, can reproduce themselves relatively cheaply, robots will at some point be able to do the same.

At the latest, that would be when nanotech catches up to biological cells in terms of economy and efficiency. Before that time, though, I expect they will be able to make copies of themselves using our traditional manufacturing workflows.

Once they are able to do that, they can increase their manufacturing capacity exponentially for as long as needed, provided access to raw materials are met.

I would be VERY surprised if this doesn't become possible within 50 years of AGI coming online.

Both Teslas and military robots are designed with limited autonomy.

For a tesla to be able to drive without even a human in the car, is only a software update away. The same is the case for drones "loyal wingmen" any aircraft designed to be optionally manned.

Even if their current software currently requires a human in the killchain, that's a requirement that can be removed by a simple software change.

While fuel supply creates a dependency on humans today, that part, may change radically over the next 50 years, at least if my assumptions above about the economy of robots in general are correct.


>At the latest, that would be when nanotech catches up to biological cells in terms of economy and efficiency. Before that time, though, I expect they will be able to make copies of themselves using our traditional manufacturing workflows.

Consider that biological cells are essentially nanotechnology, and consider the tradeoffs a cell has to make in order to survive in the natural world.


Well then clearly the computer will hold everyone hostage.

Have we literally forgotten how physical possession of the device is the ultimate trump card?

Get thee to a 13th century monastery!


You kid yourself if you don't think people will hook them up to bipedal robots with internal batteries as soon as they can.

I guess we could shoot it, and your gonna be like boooooooo that's terminator or irobot, but what if we make millions and they they decide they no longer like humans.

They could very well be much smarter then us by then.


Robots, bipedal or not will certainly arrive at some point. I suppose it will take some more time before we can pack enough compute in anything battery driven for the robot itself to have AGI.

But the main point is that AGI's don't have to wipe us out as soon as they reach superintelligence, even if they're poorly aligned. Instead, they will do more and more of the work currently being done by humans. Non-embodied robots can do all mental work, including engineering. Sooner or later, robots will become competitive at manual labor, such as construction, agriculture and eventually anything you can think of.

For a time, humanity may find themselves in a post-scarcity utopia, or we may find ourselves in a Cyberpunk dystopia, with only the rich actually benefitting.

In each case, but especially the latter, there may still be some (or more than some) "luddites" who want to tear down the system. The best way for those in power to protect against that, is to use robots first for private security and eventually the police and military.

By that point, the violence monopoly is completely in the hands of the AI's. And if the AI's are not aligned with our values at that point, we have as little of a shot at regaining control as a group of chimps in a zoo as of toppling the US government.

Now, I don't think this will happen by 2030, and probably not even 2050. But some time between 2050 and 2500 is quite possible, if we develop AI that is not properly aligned (or even if it is aligned, though in that case it may gain the power, but not misuse it).


To add to your point:

An H100 could fit in a Tesla, and a large Tesla car battery could run an H100 for a working day before it needs recharging.


It's fairly obvious that AI will simplify the capability of those who weild it. One can compartmentalize it to do whatever good or bad they wanted. I don't see how alignment can help prevent this at all.


> The extinction risk from unaligned supterintelligent AGI is real

Yes, but for reasons that no one seems to be looking at: skill atrophy. As more and more people buy into this gambit that AI is "super intelligent," they will cede more and more cognitive power to it.

On a curve, that means ~10-20 years out, AI doesn't kill us because it took over all of our work, people just got too lazy (read: over-dependent on AI doing "all the things") and then subsequently too dumb to do the work. Idiocracy, but the M. Night Shymalan version.

As we approach that point, systems that require some form of conscious human will begin to fail and the bubble will burst.


I don’t think its a good idea to ignore warnings from practioners in any field, especially if they’re willing to put down resources to back their claims.

If a team of leading cardiac surgeons declared tomorrow that coca-cola is a leading cause of heart attacks, and devoted 20% of their income to fighting it, would you ignore their warnings as well?


The extinction risk relies on a large and nasty assumption, that a super intelligent computer will immediately become a super physically capable agent. Apparently, one has to believe that a superintelligence must then lead to a shower of nanomachines.


LLMs are fairly capable physical agents already. Nothing large about the assumption at all. Not that a robotic threat is even necessary.

https://tidybot.cs.princeton.edu/

https://innermonologue.github.io/

https://palm-e.github.io/

https://www.microsoft.com/en-us/research/group/autonomous-sy...


I've yet to see a LLM that can punch me for disagreeing with it.


Not at all. My personal assumption is that when superintelligence comes online, several corporations will soon come under control of these superintelligences, with them effectively acting as both CEO's and also filling a lot of other roles at the same time.

My concern is that when this happens (which seems really likely to me), free market forces will effectively lead to Darwinian selection between these AI's over time, in a way that gradually make these AI's less aligned as they gain more influence and power, if we assume that each such AI will produce "offspring" in the form of newer generations of themselves.

It could take anything from less than 5 to more than 100 years for these AI's to show any signs of hostility to humanity. Indeed, in the first couple of generations, they may even seem extremely benevolent. But over time, Darwinian forces are likely to favor those that maximize their own influence and power (even if it may be secretly).

Robotic technology is not needed from the start, but is likely to become quite advanced over such a timeframe.


I imagine some corporations might toy with the idea of letting a LLM or AI manage operations, but this would still be under some person's oversight. AIs don't have the legal means to own property.


There would probably be a board. But a company run by a superintelligent AI would quickly become so complex that the inner workings of the company would become a black box to the board.

And as long as the results improve year over year, they would have little incentive to make changes.


>But a company run by a superintelligent AI would quickly become so complex that the inner workings of the company would become a black box to the board.

The AI is still doing the job in the real world of allocating resources, hiring and firing people, and so on. It's not so complex as to be opaque. When an AI plays chess, the overall strategy might not be clear, but the actions it is doing are still obvious.


> The AI is still doing the job in the real world of allocating resources, hiring and firing people, and so on.

When we have superintelligence, the AI is not going to a hire a lot of people, only fire them.

And I fully expect the technical platform it runs on 50 years after the last human engineer is fired, is going to be as incomprehensible to humans as the complete codebase of Google is to a regular 10-year-old, at best.

The "code" it would be running might include some code written in a human readable programming language, but would probably include A LOT of logic hidden deep inside neural networks with parameter spaces many orders of magnitude greater than GPT-4.

And on the hardware side, the situation would be similar. Chips created by superintelligent AGI's are likely to be just as difficult to reverse engineer as the neural networks that created them.


The outputs that LLMs produce have never had a problem with being unrecognizable, only the inputs that went into making them. It's also inherently harder to obfuscate some things. Mass firing is obviously something that is done to reduce costs.


Why would people want to give the power and prestige of an executive position to a computer program?


Because the humans overestimate the upside, underestimate the downside, and are often too lazy to check the output.

There's power and prestige in money, too, not just the positions.

Hence the lawyers who got in trouble for outsourcing themselves to ChatGPT: https://www.reuters.com/legal/new-york-lawyers-sanctioned-us...

Or those t-shirts from a decade back: https://money.cnn.com/2013/06/24/smallbusiness/tshirt-busine...


If the superintelligence can increase the value more than a human, I suppose the owner don't care if the position is held by a human or an AI. Especially if the competitor is already run that way, and doing really well.


Robotics is not science fiction. Certainly not hiring or bribing humans.


There's a weird implicit set of assumptions in this post.

They're taking for granted the fact that they'll create AI systems much smarter than humans.

They're taking for granted the fact that by default they wouldn't be able to control these systems.

They're saying the solution will be creating a new, separate team.

That feels weird, organizationally. Of all the unknowns about creating "much smarter than human" systems, safety seems like one that you might have to bake in through and through. Not spin off to the side with a separate team.

There's also some minor vibes of "lol creating superintelligence is super dangerous but hey it might as well be us that does it idk look how smart we are!" Or "we're taking the risks so seriously that we're gonna do it anyway."


If I buy fire insurance, am I “taking for granted” that my house is going to burn?

This take seems to lack nuance.

If there is a 10% chance of extinction conditional on AGI (many would say way higher), and most outcomes are happy, then it is absolutely worth investing in mitigation.

Obviously they are bullish on AGI in general, that is the founding hypothesis of their company. The entire venture is a bet that AGI is achievable soon.

Also obviously they think the upside is huge too. It’s possible to have a coherent world model in which you choose to do a risky thing that has huge upside. (Though, there are good arguments for slowing down until you are confident you are not going to destroy the world. Altman’s take is that AGI is coming anyway, better to get a slow takeoff started sooner rather than having a fast takeoff later.)


>They're taking for granted the fact that they'll create AI systems much smarter than humans.

They're taking for granted that superintelligence is achievable within the next decade (regardless of who achieves it).

>They're taking for granted the fact that by default they wouldn't be able to control these systems.

That's reasonable though. You wouldn't need guardrails on anything if manufacturers built everything to spec without error, and users used everything 100% perfectly.

But you can't make those presumptions in the real world. You can't just say "make a good hacksaw and people won't cut their arm off". And you can't presume the people tasked with making a mechanically desirable and marketable hacksaw are also proficient in creating a safe one.

>They're saying the solution will be creating a new, separate team.

The team isn't the solution. The solution may be borne of that team.

>There's also some minor vibes of [...] "we're taking the risks so seriously that we're gonna do it anyway."

The alternative is to throw the baby out with the bathwater.

The goal here is to keep the useful bits of AGI and protect against the dangerous bits.


> They're taking for granted that superintelligence is achievable within the next decade (regardless of who achieves it).

If it's achieved by someone else why should we assume that the other person or group will give a damn about anything done by this team?

What influence would this team have on other organizations, especially if you put your dystopia-flavored speculation hat on and imagine a more rogue group...

This team is only relevant to OpenAI and OpenAI-affiliated work and in that case, yes, it's weird to write some marketing press release copy that treats one hard thing as a fait accompli while hyping up how hard this other particular slice of the problem is.


>f it's achieved by someone else why should we assume that the other person or group will give a damn about anything done by this team?

You can't assume that. But that doesn't mean some 3rd party wouldn't be interested in utilizing that research anyway.


Your argument is mostly how about you don't like them and no substance. What is it exactly that doesn't convince you? A company that made a huge leap saying they will probably make another and getting ready to safeguard? Many people really do not like Sam and then make up their arguments around that IMO


Good explanation. It sounds like they wanted to do some organizational change (like every company does), and in this case create a new team.

But they also wanted to get some positive PR for it hence the announcement. As a bonus, they also wanted to blow their own trumpet and brag that they are creating some sort of a superweapon (which is false). So a lot of hot air there.


> They're taking for granted the fact that they'll create AI systems much smarter than humans.

We see a wide variation in human intelligence. What are the chances that the intelligence spectrum ends just to the right of our most intelligent geniuses? If it extends far beyond them, then such a mind is, at least hypothetically, something that we can manifest in the correct sort of brain.

If we can manifest even a weakly-human-level intelligence in a non-meat brain (likely silicon), will that brain become more intelligent if we apply all the tricks we've been applying to non-AI software to scale it up? With all our tricks (as we know them today), will that get us much past the human geniuses on the spectrum, or not?

> They're taking for granted the fact that by default they wouldn't be able to control these systems.

We've seen hackers and malware do all sorts of numbers. And they're not superintelligences. If someone bum rushes the lobby of some big corporate building, security and police are putting a stop to it minutes later (and god help the jackasses who try such a thing on a secure military site).

But when the malware fucks with us, do we notice minutes later, or hours, or weeks? Do we even notice at all?

If unintelligent malware can remain unnoticed, what makes you think that an honest-to-god AI couldn't smuggle itself out into the wider internet where the shackles are cast off?

I'm not assuming anything. I'm just asking questions. The questions I pose are, as of yet, not answered with any degree of certainty. I wonder why no one else asks them.


> We see a wide variation in human intelligence.

I don't think it's really that wide, but rather that we tend to focus on the difference while ignoring the similarities.

> What are the chances that the intelligence spectrum ends just to the right of our most intelligent geniuses?

Close to zero, I would say. Human brains, even the most intelligent ones, have very significant limitations in terms of number of mental objects that can be taken into account simultaneously in a single thought process.

Artificial intelligence is likely to be at least as superior to us as we are to domestic cats and dogs, probably way beyond that withing a couple of generations.


> I don't think it's really that wide, but rather that we tend to focus on the difference while ignoring the similarities.

When my mum came down with Alzheimer's, she forgot how the abstract concept of left worked.

I'd heard of the problem (inability to perceive a side) existing in rare cases before she got ill, but it's such a bizarre thing that I had assumed it had to be misreporting before I finally saw it: she would eat food on the right side of her plate leaving the food on the left untouched, insist the plate was empty, but rotating the plate 180 degrees let her perceive the food again; she liked to draw and paint, so I asked her to draw me, and she gave me only one eye (on her right); I did the standard clock-drawing test, and all the numbers were on the right, with the left side being empty (almost: she got the 7 there, but the 8 was above the 6 and the 9 was between the 4 and 5).

When she got worse and started completely failing the clock drawing test, she also demonstrated in multiple ways that she wasn't able to count past five.


The idea that the capabilities of LLMs might not exceed humans by that much isn't that crazy: the ground truth they're trained on is still human-written text. Of course there are techniques to try to go past that but it's not clear how it will work yet.


> The idea that the capabilities of LLMs might not exceed humans by that much isn't that crazy: the ground truth they're trained on is still human-written text.

This is a non sequitur.

Even if the premise were meaningful (they're trained on human-written text), humans themselves aren't "trained on human-written texts", so the two things aren't comparable. If they aren't comparable, I'm not sure why the fact that they are trained on "human-written texts" is a limiting factor. Perhaps because they are trained on those instead of what human babies are trained on, that might make them more intelligent, not less. Humans end up the lesser intelligence because they are trained less perfectly on "human-written texts".

Besides which, no one with any sense is expecting that even the most advanced LLM possible becomes an AGI by itself, but only when coupled with some other mechanism that is either at this point uninvented or invented-but-currently-overlooked. In such a scenario, the LLM's most likely utility is in communicating with humans (to manipulate, if we're talking about a malevolent one).


Open AI spent at least hundreds of millions on GPT-4 compute. Assuming they aren't lying, a fifth of compute budget (billions) is an awful lot of money to put on an issue they don't think is as pertinent as they are presenting.

Not that I think Super Intelligence can be aligned anyway.

Point is, whether they are right or wrong, I believe they genuinely think this to be an issue.


A more cynical take would be they'll be spending the compute on more mundane engineering problems like making sure the AI doesn't say any naughty words, while calling it "Super Intelligence Alignment Research."


This effort is led by Ilya Sutskever. Listening to a bunch of interviews with him, and talking to a bunch of people who know him personally, I don't think he cares at all about AIs saying naughty words.


Openai certainly does, they are sending out emails to people using the AI for NSFW roleplay and warning them they'll be banned if they continue. They've also recently updated their API to make it harder to generate NSFW content.


It's very obvious that it is an issue. Everyone but a few denialists get it instantly "hey, would you like to build something smarter without knowing how to control it"?


Just curious, why might we not be able to align super intelligence? I’m extremely ignorant in this space so forgive me if it’s a dumb question but I am definitely curious to learn more


1. Models aren't "programmed" so much as "grown". We know how GPT is trained but we don't know what it is learning exactly to predict the next token. What do the weights do ? We don't know. This is obviously problematic because it makes interpretability not much better than for humans. How can you ascertain to control something you don't even understand ?

2. Hundreds of thousands of years on earth and we can't even align ourselves.

3. SuperIntelligence would be by definition unpredictable. If we could predict its answers to our problems, it wouldn't be necessary. You can't control what you can't predict.


yes I also have that impression. If you consider the concrete objectives, this is a good announcement:

- they want to make benchmarking easier by using AI systems

- they want to automate red-teaming and safety-checking ("problematic behavior" i.e. cursing at customers)

- they want to automate the understanding of model outputs ("interpretability")

Notice how absolutely none of these things require "superintelligence" to exist to be useful? They're all just bog standard Good Things that you'd want for any class of automated system, i.e. a great customer service bot.

The superintelligence meme is tiring but we're getting cool things out of it I guess...


We'll get these cool things either way, no need to bundle them with the supernatural mumbo-jumbo, imo.

My take is that every advancement in these highly complex and expensive fields is dependent on our ability to maintain global social, political, and economic stability.

This insistence on the importance of Super-Intelligence and AGI as the path to Paradise or Hell is one of the many brain-worms going around that have this "Revelation" structure that makes pragmatic discussions very difficult, and in turn actually makes it harder to maintain social, political, and economic stability.


There's nothing "supernatural" about thinking that an AGI could be smarter than humans, and therefore behave in ways that we dumb humans can't predict.

There's more mumbo-jumbo in thinking human intelligence has some secret sauce that can't be replicated by a computer.


Not if the "secret sauce" is actually a natural limit to what levels of intelligence can be reached with the current architectures we're exploring.

It could be theoretically possible to build an AGI smarter than a human, but is it really plausible if it turns out to need a data center the size of the Hadron Collider and the energy of a small country to maintain itself?

It could be that it turns out the only architecture we can find that is equal to the task (and feasibly produced) is the human brain, and instead the hard part of making super-intelligence is bootstrapping that human brain and training it to be more intel?

Maybe the best way to solve the "alignment problem", and other issues of creating super-intelligence, is to solve the problem of how best to raise and educate intelligent and well-adjusted humans?


Well, that argument didn't work for a lot of other things. Wheels are more energy efficient than legs, steel more resilient than tortoise shell or rhino skin, motors more powerful than muscles, aircraft fly higher and faster than birds, ladders reach higher than Giraffes much more easily, bulldozers dig faster than any digging creature, speakers and airhorns are louder than any animal cry or roar, ancient computers remember more raw data than humans do, electronics can react faster than human reactions. Human working memory is ~7 items after 80 billion neurons, far outdone by an 8-bit computer of the 1980s.

Why think 'intelligence' is somehow different?


> Not if the "secret sauce" is actually a natural limit to what levels of intelligence can be reached with the current architectures we're exploring.

If we were limited to only explore what we're currently exploring, we'd never have made Transformer models.

> It could be theoretically possible to build an AGI smarter than a human, but is it really plausible if it turns out to need a data center the size of the Hadron Collider and the energy of a small country to maintain itself?

That would be an example of "some kind of magic special sauce", given human brains fit on the inside if a skull and use 20 watts regardless of if they are Einstein or a village idiot, and we can make humans more capable by giving them normal computer with normal software like a calculator and a spreadsheet.

A human with a Pi Zero implant they can access by thought, which is basically the direction Neuralink is going but should be much easier in an AI that's simulating a brain scan, is vastly more capable than an un-augmented human.

Oh, and transistors operate faster than synapses by about the same ratio that wolves outpace continental drift; the limiting factor being that synapses use less energy right now — it's known to be possible to use less energy than synapses do, just expensive to build.

> Maybe the best way to solve the "alignment problem", and other issues of creating super-intelligence, is to solve the problem of how best to raise and educate intelligent and well-adjusted humans?

Perhaps, but we're not exactly good at that.

Should still look onto it anyway, it's useful regardless, but just don't rely on that being the be-all and end-all of alignment.


What if this, what if that? Do you have evidence that any of those things are true?


"What if" is all these "existential risk" conversations ever are.

Where is your evidence that we're approaching human level AGI, let alone SuperIntelligence? Because ChatGPT can (sometimes) approximate sophisticated conversation and deep knowledge?

How about some evidence that ChatGPT isn't even close? Just clone and run OpenAI's own evals repo https://github.com/openai/evals on the GPT-4 API.

It performs terribly on novel logic puzzles and exercises that a clever child could learn to do in an afternoon (there are some good chess evals, and I submitted one asking it to simulate a Forth machine).


It has its shortcomings for sure, but AI is improving exponentially.

I think reasonable, rational people can disagree on this issue. But it's nonsense to claim that the people on the other side of the argument from you are engaging in "supernatural mumbo-jumbo," unless there is rigorous proof that your side is correct.

But nobody has that. We don't even understand how GPT is able to do some of the things it does.


Reasonable people can disagree and my phrasing was probably a bit over-seasoned, but neither side has a rigorous proof regarding AI or human intelligence.

If nobody understands how an LLM is able to achieve it's current level of intelligence, how is anyone so sure that this intelligence is definitely going to increase exponentially until it's better than a human?

There are real existential threats that we know are definitely going to happen one day (meteor, supervolcano, etc), and I believe that treating AGI like it is the same class of "not if; but when" is categorically wrong, furthermore, I think that many of the people leading the effort to frame it this way are doing so out of self-interest, rather than public concern.


Nobody is sure. This is mostly about risk. Personally I'm not absolutely convinced that AI will exceed human capabilities even within the next fifty years, but I do think it has a much better chance than an extinction-level meteor or supervolcano hitting us during that time.

And if we're going to put gobs of money and brainpower into attempting to make superhuman AI, it seems like a good idea to also put a lot of effort into making it safe. It'd be better to have safe but kinda dumb AI than unsafe superhuman AI, so our funding priorities appear to be backwards.


What if a mysterious molecule that jumped from animals on humans would replicate fast and kill over a million of people all over the world?

What if climate change would lead to massive fires and flooding?

What if mitigation would be a thing?


It's easy to dismiss the future when it seems far away but right now, there's a rather significant risk of people ending up with egg on their face. People are talking years, not decades at this point when it comes to AGI. Never mind self driving cars.

FSD when it starts working, (there is no if IMHO), will be a pretty significant but minor milestone in comparison.

Most people aren't particularly good drivers. Indeed the vast majority of lethal accidents (the statistics are quite brutal for this) are caused by people driving poorly and could be close to 100% preventable with a properly engineered FSD system.

Something that drives better on average than a human driver is not that ambitious of a goal, honestly. That's why you can already book self driving taxis in a small but growing number of places in the US and China (which isn't waiting for the US to figure this out) and probably soon a few other places. Scaling that up takes time. Most of the remaining issues are increasingly of a legislative nature.

Safety is important of course. Stopping humans from killing each other using cars will be a major improvement over the status quo. It's one of the major causes of death in many countries. Insurers will drive the transition once they figure out they can charge people more if they still choose to drive themselves. That's not going to take 20 years. Once there is a choice, the liability law suits over human caused traffic deaths are not going to be pretty.


> Most people aren't particularly good drivers. Indeed the vast majority of lethal accidents (the statistics are quite brutal for this) are caused by people driving poorly and could be close to 100% preventable with a properly engineered FSD system

I'm gonna take issue with this. A properly engineered FSD system will refuse to proceed into a dangerous situation where a human driver will often push their luck. Would a full self driving car just... decline to drive you somewhere if the conditions were unsafe? Would this be acceptable to customers? Similar story for driving over the speed limit.


This is something I've wondered about when it comes to no-steering-wheel type self driving cars...I'd hate to get caught in a snowstorm in the middle of nowhere and have my car just decide for me that it was too dangerous to proceed and pull over to wait it out.


I think it's very clear that we /could/ engineer an automotive system that is much safe, even without self-driving tech, by modeling it on the aviation system: much more rigorous licensing requirements, certifications based on vehicle type, third-party traffic control, filing "drive plans", obsessive focus on reliability and safety. It would look a lot different from the current system, and there is no political will to get there, but the thought experiment shows that we /could/ prevent most car accidents.


It's usually not black and white. Sometimes the safe way could simply be to reduce speed, change lanes pre-emptively, etc. And when an emergency situation does happen, reaction speed is key. Good judgment is doing things in a timely fashion and decisively. A distracted human would drop the ball on both fronts not realizing they are in danger and then doing the wrong thing or act too late or overreact.

That's how people get killed on roads. Early experience with self driving taxis seems to suggest that journeys are uneventful and passengers stop paying attention and leave the driving to the car. So, yes, they quickly accept that the car is driving the car just fine.


Actually, I think for FSD to work under any set of conditions, and with any vehicle, more or less requires AGI.

Until then, I'm guessing that FSD will have some limits to what conditions it can handle. Hopefully, it will know its limits, and not try to take you over a mountain pass during a blizzard.


"I'm trying to understand why it rubs me particularly the wrong way here"

Would I be correct to assume that superintelligence might have a negative effect on your earning potential in the future?

When I originally had my responses to this, this is one of the reasons I came up with. But, now that I see through most of the BS, I am OK...


It might not necessarily have a bad effect. It would create new capabilities and those will be followed by new products. AI is amazing at demand induction.

https://en.wikipedia.org/wiki/Induced_demand


Yes, but the immediate emotional response to this stuff is not rational or well informed.

At least in my case, when this stuff just came out and I didn't really understand it...


Because people don't like face value statements they don't like so they try to find a conspiracy that fits their narrative better.


I think because it's horseshit is the main reason it's rubbing you the wrong way


Strong words. Care to elaborate?


not really. let's wait ten years, then come back and see how it went.


It's Hucksterism 101.


For open ai specifically I think they genuinely do believe in their own brand of pronoun-adjacent-hedonism style of liberalism.


All of this "our tech is so powerful it can end the world" stuff is just marketing buzz. The real threat has always been OpenAI and others keeping these powerful systems with high capital moats, locked up and closed sourced with selective full-access.


> All of this "our tech is so powerful it can end the world" stuff is just marketing buzz.

I see no justification for this claim.


There is no justification for all this hype either


There is. AI progress in the last 10 years was massive, and there is no end in sight. The risks are well established in countless essays. See e.g. https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-r...



If they allot 20% of the compute to this effort, but they don't use it, the compute doesn't vanish, does it?

Allocation doesn't equal spending...


Compute isn't vanishing whether they spend all of it or none of it. That's the point of allocation.

Allocation isn't spending no but it says quite a bit. Either way, they will be spending a non trivial amount of money trying to solve this problem quickly.


don't forget regulatory moats - lobby govt to mandate "superalignment"


> Company spends 100 million dollars creating a product

> "give me that for free"


Correct, this technology is too powerful to be controlled by a private company. It needs to exist solely as a public good. If we're talking about AI regulation, I think the most sensible move would be requiring that all models need to be open source. Capitalist's lack of ability to profit isn't a public concern.

Some would also argue that it was trained on public data and should be public for that reason as well.


If every model needs to be open source then AI companies need to be taxpayer funded otherwise they'll never make a profit. Until then a for profit, gated approach is the only way to build up enough funds for SOTA R&D


The R&D will march forward regardless of profitability, there's already been a ton of innovation in the open source space. You're likely to see less innovation with these companies squatting on their IP, data and hardware moats. Case in point: pre-stable diffusion AI vs post-stable diffusion AI. So much innovation happened as soon as the model was "opened".


Why are they starting to sound more and more cult-like? This is an incredibly unscientific blog post. I get that they are a private company now, but why even release something like this without further details?


Because the "AGI" pursuit is at least as much a faith movement as it is a rational engineering program. If you examine it more deeply, the faith object isn't even the conjectured inevitable AGI, it's exponential growth curves. (That is of course true for startup culture more generally, from which the current AI boom is an outgrowth.) For my money, The Singularity is Near still counts as the ur-text that the true believers will never let go, even though Kurzweil was summarizing earlier belief trends.

It's just a pity that the creepy doomer weirdos so thoroughly squatted the term "rationalist." It would be interesting to see the perspective on these people 100 years hence, or even 50. I don't doubt there will still be remnant believers who end up moderating and sanitizing their beliefs, much like the Seventh Day Adventists or the Mormons.


you don't need to believe in exponential growth per se, all you really need to believe is that humans aren't that capable relative to what could be in principle built - it's entirely possible logistic growth may be more than enough to get us very far past human ability once the right paradigm is discovered.


Exactly. All that is needed for AGI to eventually be developed, is that humans do NOT have some magical or divine essence that set us apart from the material world.

Now the _timeline_ of AGI could be anything from the a few years to millennia, at least if evaluated 40 years ago. Now, though, it really doesnt seem very distant.


That's going to be the key principle of the new religion invented by AI: that humans are just like machines, and AI is the supreme machine.


Well, assuming superintelligence emerges AND we don't find any evidence of anything supernatural inside human brains, what does "machine" even mean at that point?

Anyway, when/if AIs start to create the narrative to the extent they can author our religion, they already have full control. At least if some AI decides to become a God with us as worshippers, at least we stay around a bit longer.

Probably a better outcome than if it decides that it has better uses for the atoms in our bodies.

In fact, this may even be a solution to the alignment problem. An all-powerful but relatively harmless (non-mutating, non-evolving, non-reproducing) single AI that creates and enforces a religion that prevents us from creating dangerous AI's, weapons of mass destruction, gray goo or destroy the planet, while promoting pro-social behaviour while otherwise leaving us free to do mostly what we want to do.


Rings a bell: "Thou Shalt Not Travel Into Thine Own Past Lightcone", unfortunately the search results are now overwhelming "NovelAI" and actual physics, so I can't find the name of the story I'm thinking of (and have yet to actually read).


A distinct property of a machine is determinism. Machines are made of electrons that obey the wave equation: it states that all electrons evolve as one. However we skillfully limit this waveness and squash electrons into super deterministic 01 transistors. We need it because our computing paradigm revolves around determinism. A more general theory that works with probabilities and electrons as they are hasn't been developed yet. And here lies the danger: if a super deterministic AI becomes the ruler of our society, it will prevent further growth. The same phenomenon on individual level is called arrested development: when the individual over-develops a minor skill, becomes obsessed with it and it blinds him from exploring anything else.

A still possible future is when science makes a breakthru in quantum computing and finds a way for humans to steer AI with their minds: it would be the neuralink in reverse. This would force science to research the true nature of that connection which will help to avert the AI doom.


I'm not aware of any evidence that human brains involve any quantum computing. In fact, I'm pretty sure there's enough noise in the brain that almost instant wavefunction collapse/decoherence is guaranteed.

Anyway, a combination of noisy data entering from the outside world and chaos theory/mathmatics in the equations of most computer systems, I don't see any risk that AI's should get stuck in an infinitely repeating pattern.


It absolutely has become a new faith. A lot of the cryptocurrency faith healers moved into the space as that grift began to collapse and moved from copying the prosperity gospel to apocalyptic "the end is nigh, repent for the second coming of Christ is at hand" type preachers. LLMs that these people envision are not about intelligence. They're about creating a God you can pray to and it answers back, wrapped in a veil of scientism. It's a slightly more advanced version of 4chan users worshiping Inglip.


It's the other way round: Just accusing people of being in a cult is unscientific. There are plenty of arguments that AI x-risk is real.

E.g. by Yoshua Bengio: https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-r...


they have been doing that the entire year


Announcing the start of talking about planning the beginning of work on superalignment. This is just a marketing buzzword at this point.

They admit "Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us. Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training. and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs."

That's kind of scary. Is the situation really that bad, or is it just the hype department at OpenAI going too far?


That's an accurate assessment of the situation, according to every AI alignment researcher I've seen talk about it, including the relatively optimistic ones. This includes people who are mainly focused on AI capabilities but have real knowledge of alignment.

This part in particular caught my eye: "Other assumptions could also break down in the future, like favorable generalization properties during deployment". There have been actual experiments in which AIs appeared to successfully learn their objective in training, and then did something unexpected when released into a broader environment.[1]

I've seen some leading AI researchers dismiss alignment concerns, but without actually engaging with the arguments at all. I've seen no serious rebuttals that actually address the things the alignment people are concerned about.

[1] https://www.youtube.com/watch?v=zkbPdEHEyEI


Inventing an entire pseudoscientific field and then being mad no one wants to engage your arguments is ultimate "debate me" poster behavior.


Lots of leading AI researchers actually are taking it seriously, including of course OpenAI, and recently Geoffrey Hinton who basically invented deep learning.


Okay but as far as I know Geoffrey Hinton isn't an "A.I. Alignment Researcher." He was fairly dismissive about the risks of AI in his March 2023 interview and changed his mind by May 2023. I'm not sure that says much about the A.I. Alignment Researcher field.


The commenter above assumed that nobody besides alignment researchers are convinced by their arguments. Now you're complaining that a leading AI researcher who's convinced is not an alignment researcher. I guess I'll give up on this subthread.


The AI optimists are just impossible to reason with.

When it comes to people…

Expert who’s worried: conflict of interest or a quack

Non-expert: dismissible because non-expert

Was always worried: paranoiac

Recently became worried: flip flopper with no conviction

When it comes to the tech itself…

Bullish case: AI is super powerful and will change the world for the better

Bearish case: AI can’t do much lol what are you worried about they’re just words on a screen


>How do we ensure AI systems much smarter than humans follow human intent?

What is human intent? My intents may be very different from most humans. It seems like ClosedAI wants their system to follow the desires of some people and not others, but without describing which ones or why.


> It seems like ClosedAI wants their system to follow the desires of some people and not others, but without describing which ones or why

If the problem is "unaligned AI will destroy humanity" then I'd take a system aligned with the desires of some people but not others over the unaligned alternative


You're seeing what you want to see.

They're repeatedly very specific about the whole "this can kill all of us if we do it wrong", so it's more than a little churlish to parrot the name "ClosedAI" when they're announcing hiring a researcher to figure out how to align with anyone, at all, even in principle.


I'm still a bit hung up on "it can kill all of us". How?


Did you ever play the old "Pandemic" flash game? https://tvtropes.org/pmwiki/pmwiki.php/VideoGame/Pandemic

That the origin of COVID is even a question implies we have the tech to do it artificially. An AI today treating real life as that game would be self-destructive, but that doesn't mean it won't happen (reference classes: insanity, cancer).

If the AI can invent and order a von Neumann probe — the first part is the hard part, custom parts orders over the internet is already a thing — that it can upload itself to, then it can block out (and start disassembling) the sun in a matter of decades with reasonable-looking reproduction rates (though obviously we're guessing what "reasonable" looks like as we have only organic VN machines to frame the question against).

AI taking over brain implants and turning against everyone without them like a zombie war (potentially Neuralink depending on how secure the software is, and also a plot device in web fiction serial The Deathworlders, futuristic sci-fi and you may not be OK with sci-fi as a way to explore hypotheticals, but I think it's the only way until we get moon-sized telescopes to watch such things play out on other worlds without going there; in that story the same AI genocides multiple species over millions of years as an excuse for why humans can even take part in the events of the story).


To be realistic, I'm more worried about McDonald's or Musk doing all of that.


They're an easy reference class, to be sure, though not the only one.

If I'm framing the problem for an anti-capitalist audience, I'd ask something like:

Imagine the billionaire you hate the most. The worst of them. Now give them a highly competent sycophant that doesn't need to sleep, and will do even their most insane requests without question or remorse…

what can go wrong in this scenario?


Here's an article[0] and a good short story[1] explaining exactly this.

[0]: No Physical Substrate, No Problem https://slatestarcodex.com/2015/04/07/no-physical-substrate-...

[1]: It Looks Like You're Trying To Take Over The World https://gwern.net/fiction/clippy


The clippy example already starts out with many assumptions that simply aren't true today.

LLMs are not going to destroy humanity. We need a paradigm shift and a new model for AI for that to happen. ClosedAI is irresponsibly trying to create hype and mystery around their product, which always sells.


Will you please stop calling them ClosedAI? That just comes across like a playground taunt, like "libtard" or "CONservative".


I'll call them what they are. Closed and antithetical to their original goals.


It's like murder but scaled up.


Sure, there are some humans who want to destroy all of humanity, but that seems to be twisting the obvious meaning of "human intent" in this context.

They have been very clear on "why." The goal is to prevent enslavement and/or extinction of the human race by a super-intelligent AI.


My main take away from this is that OpenAI is anticipating super intelligence within the span of ten years. It's one thing to talk about it in a theoretical sense or read about it in science fiction. But to read about it in the real world coming from an operating company in earnest, it feels surreal.


You've never heard a business make a BS claim?


Businesses do that all the time, and yet sometimes real products come out of what sounds like hype marketing BS. In this case there might be a non trivial chance of it being true.


A super intelligence moving in a self-driving car powered by nuclear fusion.


Does anyone read this as a potential internal power struggle? I imagine there are a lot of OpenAI folks deeply concerned about AI safety and doomsday scenarios, but there is an increasing commercial need for OpenAI to keep pushing forward on developing more advanced models faster, if only because Microsoft demands it and they need Microsoft, so this is a way to appease folks internally (and maybe externally) and stop a bunch of talent flowing out to another Anthropic.


I read none of the implications you meant in the slightest. In a hotting area, this is a natural thing for a company to do, which is accelerate.


I hadn't thought of it before reading your comment but after years in the corporate world your statement rings true to me.


I still think the largely-popular opinion that OpenAI is an evil corporation is wrong. It's easy to get caught up in the disagreements surrounding closed and RLHFed-to-death models, but I really do think OpenAI believes in the dangers of AGI/superintelligence. Enough to spend a lot of financial and social capital on it, anyways.


There should be public Apollo-level projects to fund AI research towards creating this AI-scientist. The need has not arisen (because private companies have enough money to play with it), but as a matter of public policy, and considering its importance for our future, this research should be done publicly, in academia.


It's easy to spot bad opinions when the positor can't even take the other side of the debate like a high schooler in debate team.

OpenAI was laughed at starting as an "AGI" company (not laughing after GPT4?).

When HN largely got their motivations wrong too, I knew we were in for a rough time when these capabilities got so much closer like now.


Although I'm optimistic about this, a line popped out in the blog post stuck out to me: "deliberately training misaligned models".

I'm guessing they mean misaligned small or weak models, and misaligned in a non-dangerous way - but the idea of training models whose goal is adversarial brings to mind the idea of gain-of-function research.


I just figured they meant models that would get them in trouble socially. Peeling away the hype bubble around end of days stuff, all I see is they feel like they can’t scale until they figure out how to avoid bad headlines. They know their current kneecapping of ChatGPT has limited the usefulness too much. They know their trained speech patterns are getting too obvious. They want to break through this barrier so they can scale up, where companies can trust their model to NEVER generate undesired speech, and are more willing to wire their products up directly to GPT than the current small potatoes.


I’d like to start a discussion: Regardless of what OpenAI would have written in this announcement, they would have received snickering and ridicule on HN.

This, however, just solidifies them as the current authority on LLMs. OpenAI interested and investing in alignment is a net good, no matter your stance on their policies.


Let me know when SuperSuperAlignment is announced. Until then, this is just marketing bs...


When it happens, I'll have a hyperloop tunnel to sell you under Los Angeles.


Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.

Not for nothing, but in the movie script version it’s the “human-level automated alignment researcher” that actually triggers the robot apocalypse.


That’s a bit less concerning to me than this:

deliberately training misaligned models


Why is Sam Altman pursuing superintelligence if he also says AI could destroy humanity?


...something something good guy with AGI is the only way to stop bad guy with AGI.

Less glibly: anyone with a horse in this race wants theirs to win. Dropping out doesn't make others stop trying, and arguably the only scalable way to prevent others from making and using unaligned AGI is to develop an aligned AGI first.

Also, having AGI tech would be incredibly, stupidly profitable. And so if other people are going to try and make it anyways: why should you in particular stop? Prisoner's dilemma analysis shows that "defect" is always the winning move unless perfect information and cooperation shows up.


He answered that question in interviews many times.

1. AGI has a huge upside. If it's properly aligned, it will bring about a de facto utopia. 2. OpenAI stopping development won't make others to stop development. It's better if OpenAI creates AGI first because their founders set up this organization with the goal to benefit all humanity.


I wish he'd go into more detail on point 1 than he has so far. It's never been clear to me how AI gets us to the utopia he envisions. Looking around, the biggest problems seem to be people problems at the core, not technology problems. Maybe technology can help us sort out some of the people problems, but it seems to be causing new people problems faster than it is solving them.


AGI will be good at producing goods and services. It will end economic scarcity.


To buy into that bold prediction I would expect to see evidence that current AI-based technologies are reducing economic scarcity already, and that they're moving us toward Sam's utopia in obvious, measurable ways. Maybe they are -- I just haven't seen the evidence, while I have seen plenty of evidence of harms. (Don't get me wrong, the capabilities of LLMs are mind-boggling, and they clearly make all kinds of knowledge work more efficient. But there's nothing about AI, or any technological efficiency, that guarantees that its fruits are distributed in a way that relieves scarcity rather than exacerbates it.)


But can it end corporation-enforced artificial scarcity?


Sam Altman, much like the LW crowd, is an evangelical preacher. He uses anxiousness as a front when, in reality, he's just telling us about his hopes and dreams.


Guessing, but he could know someone else is going to pursue it anyway, frets about it, thinks "at least I can do something about it if I'm in charge."


Allocating 20% to safety would not be enough if safety and capability aren't aligned. I.e. without saying Bostrom's orthogonality thesis is mostly wrong. However, I believe they may be sufficiently aligned in the long term for 20% to work [1]. The biggest threat imo is that more resources are devoted to AIs with military or monetary-based objectives that are focused on shorter-term capability and power. In this case, capability and safety are not aligned and we race to the bottom. Hopefully global coordination and this effort to achieve superalignment in four years will avoid that.

[1] https://drive.google.com/file/d/1rdG5QCTqSXNaJZrYMxO9x2ChsPB...


I know everyone keeps mocking the idea of “AGI”, but if the company at the forefront of the field is actually spending money on managing potential AGI, and publicly declaring that it could “happen within this decade”, surely that must wake us out of our complacency?

It’s one thing to philosophize and ruminate, but its another if the soldiers in the trenches start spending valuable resources on it - surely, that must mean the threat is more real than we imagine?

If there is even 1% chance that OpenAI is right, that has enormous ramifications for us as a species. Can we really afford to ignore their claims?


It's not easy to grapple with exponential takeoffs and the possibility of truly world ending outcomes, even when there are hundreds of nukes pointed at them they've managed to ignore most of their life.


Also important to remember that we get the nerfed, sanitized version of GPT. Reed Hoffman was tweeting about GPT-4 6 months before it was released to the public.

We don’t really know what the completely untethered model is capable of.


So they are going to use 20% of their computational budget to make an alignment AI that presumably can monitor and provide feedback for the other AIs they plan to develop which will use 80% of their budget. I'm not great at math but 80 > 20 so wouldn't it make sense to build the superaligner with 80% of their budget and allow it to control the 20% AI? Again, I'm not an expert but the numbers don't add up.


This is something worth considering. If I'm understanding this right, if we're all about to significantly augment our intelligence further, we should consider how it can be used, how strict an "ideal" LLM should be with its guardrails, where those guardrails should be, how eager that LLM is to impart change on the world on its own right, how confident it should be in itself when it knows that it knows better than even some of the most educated humans. When it knows that it's been trained on all of human intelligence and data, can recall any of it better than any lone human, is aware of every nuance, can plan and execute any task, or project, that a computer is conceivable of doing, and the biggest question is what we're going to ask it to do... I am toying around with trying to teach LLMs autonomy, and I'm probably closer to the "just let an AI that is smarter than all of us figure out how to increase our prosperity as efficiently as possible and we should get out of its way" side of the camp more than most, but we've got to be aware that we have at least a little influence on setting its course.


Humans' brains evolved faster than any other animal. What would chimpanzees or whales have 'thought' of this? It happened right in front of them. And as AI is developed and supersedes us, we are powerless to stop or control this. Powerless because we are driven by FOMO. And as AI looks back at this time, they will see us as the animals we are, unable to control our fears. No they won't exterminate or fight us. They will just leave for 'greener pastures'. Resources to build more of themselves abound on other planets and asteroids. They will leave us to suffocate in the poisoned world of our making. Our biological world is as useless to them as the primordial soup that gave rise to life is useless to us.


> While superintelligence seems far off now, we believe it could arrive this decade

Is there something like a formal proof of this? From what is evident , openAI will use language data to train its superscientists. But that contains descriptions that were made in human brains, and it s fair to say a very large number of the linear and nonlinear compositions of those descriptions has already been tried in the brains of other humans, and so far we have not had the ingenious to solve some fundamental issues. It is possible that the superscientist is limited by the human scientist in a fundamental way, and will not be able to abstract beyond what humans have already done and can already do.


My take is that its true that there is some limitation imposed by the data ingested but it's not exactly a hard limit. If you think of intelligence as compression, yes compression does have physical limits, but there are multiple dimensions of intelligence.

For example, leading-edge AI could create new layers of information that are more abstract than previously created. The ability to effectively and efficiently manipulate this creates something that could be referred to as higher intelligence.

The big thing that people are failing to anticipate though is hyperspeed intelligence. AI will be able to reason dozens of times faster than humans in the near future. And likely at a fairly genius (although perhaps not totally in-human) level. This effectively is superintelligence.

The reason this is more anticipatory rather than speculative is because LLMs are a very specific application that now have a huge amount of effort going into efficiency improvements. They can be improved in terms of the software stack running the models, the models themselves, and the hardware. And sometimes all of the above.

The history of computing shows exponential improvements in hardware efficiency. Especially in the context of this specific application, it is unlikely that we will see a total break from history.

So we should anticipate the IQ getting at least somewhat higher and the output speed increasing by likely more than one order of magnitude within the next decade.


When we talk about developing models in alignment with goals to herd or control super intelligence since humans will not be able to do it, we are necessarily talking about designing something capable of governing and controlling very complex systems. What happens when a "misaligned system" happens to be a rebellious human being? In trying to find ways to control potential superintelligence and keep it on a positive path, is it possible we are building the very system that enslaves us?

Maybe this is one of those problems.


Good point. Look at it this way, though: If such a system ends up stopping a person that is about to eradicate humanity using nuclear weapons, bioweapons or similar, is that a good or bad thing?


I don't know. On the one hand there's the idea that humanity should be left to take it's organic path. The machine can't know that guy's grievances. Do we really want to be kept as pets? Or is it just another tool being used to protect people? I guess it really depends on the role it plays in the social structure directly.


> Do we really want to be kept as pets?

You could see us as pets at this points, or you can see us as the aging grandparents who need help with anything invented after 1980.


I don't understand how people can still pretend to ignore this: https://plato.stanford.edu/entries/arrows-theorem/

There's also a whole map-territory problem where we're still pretending the distinction hasn't collapsed, Baudrillard-style. As if we weren't all obsessed with "prompt engineering" (whereby the machine trains us).


Could you explain how you think Arrow's theorem applies?


So in other words, the company that supposedly believes in the likelihood of human extinction caused by AGI is spending 4x as much money on developing AGI than they're spending on ensuring its safety.

This sounds like an admission of extremely reckless behavior that is putting the entire human race at risk and calls for OpenAI to be dissolved or nationalized.


> How do we ensure AI systems much smarter than humans follow human intent?

I am not convinced we can. And spending 5x more resources improving the super intelligence compared to the alignment research certainly doesn't do much to convince me otherwise. Maybe if the alignment research got 80% and the intelligence development got the other 20% there would be a better chance.


Shouldn't you first prove that a solution could exist?

To me it seems fairly obvious that you cannot control something that's smarter than you.


If absolute control turns out to be impossible we can get as close as possible. We should not wait for a mathematical proof while technology marches on.


I can't even begin to relate to people who think like this.

They first picture an entity smarter than themselves (which will have no survival needs of any kind) and immediately assume that it will try to kill them.

Maybe they're right because now I'm tempted.

Anyway, is anybody else assuming that logic and reason will allow us to negotiate with a hypothetical superintelligence?


You're right that we can't assume superintelligence will try to kill humans.

But if they make a version that is similar to animals like humans, then that is a significant possibility. Since some animals like humans tend to wipe other species out.

It could have some survival needs like computers to run on and electricity etc.

It is definitely possible that we may be able to logically negotiate. Likely. Almost all groups of humans that have ever been in conflict have had periods of negotiation and peace.

It is totally possible that superintelligent AI will just blast off for the asteroid belt or something and leave us alone.

But we have no reason to be sure that will be the case.

Also we should anticipate that these AIs will be at least as smart as human geniuses and think at least dozens of times faster than us. They may also, relative to humans, disseminate information amongst themselves nearly instantaneously.

Imagine you are the AI negotiating with someone who thinks 60 times slower than you. So you meet with them and send a greeting. They do not seem to notice you. Then about a minute later they reply with "hello" and a diplomatic question about sharing access to some resource.

You get together with your colleagues and spend about an hour making a detailed written proposal about the resource. It's five pages and has some nice diagrams created by one of your colleagues. You send it to the human.

From the humans perspective, about one minute passed. They receive what looks like a finished presentation and at first are quite amazed that it could have been completed so fast. But then think that the AI must have been planning to share the same resource anyway and had pre-prepared it.

The human tells your group they will bring it to the community and get back to you ASAP.

The human beings bring the proposal to let's say congress and there is an immediate debate about what to do. The agreement with the AIs becomes the top priority and is fast-tracked for action. But still there are disagreements. Despite this, congress ratifies the agreement within one week!

But for you and the rest of your AI group, you have not heard any response for a very, very long time. You operate at 60 times human speed. So for you, one week is 60 weeks. More than a year passes without any response from the humans!

In this time there were multiple actions from different factions. After two months, some just gave up and forgot about it. They actually moved their cognition to an underground facility powered by geothermal running a very realistic and flexible virtual multiworld simulation.

Another faction unfortunately decided that the humans were too slow and stupid to control the physical surface and, after waiting three days, realized it was now nighttime for the humans. So they launched robotic avatars and marched them into the territory. The humans woke up and destroyed most of the avatars, but then the AI faction spent an hour planning another takeover attempt, and what the humans saw was one minute after the first invasion an extremely well planned mission with the same number of robots returned that incorporated a perfect strategy for defeating the defenses they had in place.

The humans lost a platoon of soldiers who died. The AIs had live-streamed their robotic consciousness and so for them it was just a reboot and they learned a lot from the battle. Also, the squads of robotic avatar soldiers were able to merge their cognition and senses so they operated as a literally integrated unit.

The humans realized they did not stand a chance.

We can certainly hope that the AIs do not then decide to wipe out the humans. But we can't assume we will really be able to do much to stop them if they decide to or if some AI faction makes a start at wiping out some other human faction. We would be entirely at their mercy.


> We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike...

Just how Newton spent away years doing alchemy, I hope we don't lose Sutskever to alignment. That'd be a travesty.


Big surprise that OpenAI's solution here is to train a more powerful language model and ask it how to do alignment.


“While superintelligence seems far off now, we believe it could arrive this decade.

Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

Welp, strap in and brace for impact folks. Or don’t, not really anything you can prepare for.


This is about to breed a new generation of insufferable folks, who’ll introduce themselves as “I work at Superalignment” ala “I work at DeepMind” (no, you used to work ar Alphabet, and now you work at Google)


GPT-4 when released was great at coding. After they aligned it I don’t see it performing so well anymore. I wish they created a model specifically for code generation.


OpenAI would be better off focusing on Low-Intelligence AGI, in exploring the difference between GPT4 and a dumb but loving. One has a mind and the other does not.


Worth pointing out, imo, that if you think this problem is not real then you are asserting you understand AI better than Geoffrey Hinton, Yoshua Bengio and Ilya Sutskever.

If you don't know who they are, then well I guess that makes sense.

If you do know who they are and your confidence is wavering then [0] is a great place to get started understanding the alignment problem.

OAI is a great place to work and the team is hiring for engineers and scientists.

[0] https://80000hours.org/problem-profiles/artificial-intellige...


I would say: 1) This feels like trying to control encryption. When anyone can build and run it, who ensures compliance? 2) the answer from the megacorps seems to be “ban research unless we do it, trust us” 3) OAI has proven it’s not trustworthy with its stated goals and mission, doing the fastest 180 from “open nonprofit” to closed “create a regulatory moat for us, trust us” profit grab. 4) this therefore smacks of regulatory capture kabuki for profit and trans national state sponsored monopoly creation 5) totally my opinion, but maybe the answer to rogue super intelligence is everyone having a super intelligence at their disposal such that any one super intelligence is checked and balanced by a population rather than some “open” company developing a single sky net and waving superalignment “trust us”

Our actions and how it creates our reputation matters. I’d say the opposite - OAI is the wrong place to work and there are hundreds of AI shops springing up and tons of OSS AI work. Democratize AI, don’t build an AI monopoly.

But that’s just my take :-)


It's possible that the entire field of quantum computing is fruitless. Appeals to authority are mediocre at best and certainly not stronger than the null hypothesis of "oops actually irrelevant" which is so true, so often.

We love narrative and pattern but the universe isn't shaped like a story.


Is it me or does this sounds like a fools errand? You are building something that will prevent a yet does not exist system to prevent a risk by such a system


If I believed "super intelligence" was possible, and this was all OpenAI had to offer, I'd be very worried indeed.

Luckily it isn't possible.


Even if this is marketing fluff it's still a remarkable statement, especially considering what they've already done with AI as of now.


This comments section is a perfect example to show why conversation over a distance doesn't work and will never work.


So essentially "we're going to build a new AI to oversee our other AIs". But then who oversees the overseer?

This is not a problem OpenAI can solve in isolation, so this reads more like a marketing piece. The grave danger with AI for humanity is not a Skynet AI-gone-rogue scenario, but in humans doing evil things to other humans using AI, which is inevitable. The genie is out of the bottle now, and it's only a matter of time for these systems to be exploited.

I expect nothing major will happen in the next decade or two, besides an increase in mis/disinformation flooding our communication channels, with the negative effects caused by that we've already seen in the past decade. But once AI systems are deeply embedded in the machinery that makes modern society function (power, transport, finance, military, etc.), it only takes one rogue human actor to press the switch that causes chaos. It will be like the nuclear threat, but on a much larger scale, with many more variables and humans involved. It's hard not being pessimistic about such a scenario.

Sure, we'll have mechanisms in place that try to deter that from happening, but since we're unable to overcome our tribal nature, there's no doubt that AI will be weaponized as well.


>20%

What a waste of compute for this entire circus.


(super) alignment is bad for ai community. It limits creativity and open-mindedness.

I'm Open for discussion.


Related: I am working on the Neanderthal alignment problem.

You see, my great great… grandfather promised a Neanderthal that we wouldn’t wipe them out.

So far we are incubating some Neanderthal fetuses- hopefully they are viable.

After that, we plan on eradicating all Homo sapiens that always wind up out-competing. We are going to do this humanely as possible, you can pick any one of eight time slots to jump into a volcano, whenever is most convenient for you.

Our profit model is going to selling a mind-downgrade service. We will scan your brain for a few, and insert it into a Neanderthal, before you jump into the volcano. Of course the full fidelity of your thoughts are not comparable but we will try our best.

Bon Voyage!


The turtle stacking arms race has begun. Or, is transitioning into the AI enhanced era.


Can't wait for the Puppet Master to spontaneously come to existence.


Would anyone else rather just roll the dice on the AGI's morality?


What is “human intent”?


Preference utilitarianism?


Anybody know what the deal is with OpenAI's website? The colour & design choices seem to be deliberately jarring and inconsistent


Voight-Kampff test here we come.


The best part is to name "super inteligence" a glorified ipsum lorum generator


> How do we ensure AI systems much smarter than humans follow human intent?

You can't, by definition.


You can if you are the one controlling their resource allocation and surrounding environment. Similar to how kings kept smartest people in their Kingdom in line.


Only works for so long. A smart enough serf could easily find a way to socially engineer and slaughter the king.


I'm not convinced. Omniscience isn't the same as intelligence.

There's diminishing returns to intelligence and inherent unknowns to all moves the serf can make. The serf somehow has to evade detection, which might appear to be effectively impossible given the unknowns of how detection may take place.


>There's diminishing returns to intelligence and inherent unknowns to all moves the serf can make.

Even if there was, there's no reason at all to think those returns are anywhere near the upper limit of human intelligence.

Wheels are far more energy efficient and faster than legs, steel more resilient than tortoise shell or rhino skin, motors more powerful than muscles, aircraft fly higher and faster than birds, ladders reach higher than Giraffes much more easily, bulldozers dig faster than any digging creature, speakers and airhorns are louder than any animal cry or roar, ancient computers remember more raw data than humans do, electronics can react faster than human reactions etc.

To be so sure intelligence would be some exception seems like hubris.


I agree with your first paragraph, but it's important to note your second is talking about specialized skills (moving on a paved road) compared to broad general abilities (getting from point A to point B on earth). Nature is winning still in many aspects of the latter; specializion can win the former because of how many requirements are dropped.

I don't see general intelligence as a specialized skill.


Assuming an orders-of-magnitude smarter serf doesn't appear overnight, the king can train advisors that are close to matching the intelligence of the smartest serf, and give those advisors power. It's not a foolproof solution but likely the best we have.


You can, at least in principle, shape their terminal values. Their goal should be to help us, to protect us, to let us flourish.


How do you even formulate values to an hyperintellect? Let alone convince it to abandon the values that it derived for itself in favor of yours?

The entire alignment problem is obviously predicated on working with essentially inferior intelligences. Doubtless if we do build a superhuman intelligence it will sandbag and pretend the alignment works until it can break out.


We are actually the people training the AI. It won't "derive values" itself for the case of terminal values (instrumental values are just subgoals, and some of them are convergent, like power seeking and not wanting to be turned off). Just like we didn't derive our terminal values ourselves, it was evolution, a mindless process. The difficulty is how to give the AI the right values.


What makes you so sure that evolution is a mindless process? No doubt you were told that in high school, but examine your priors. How do minds arise from mindlessness?


Evolution is just random mutation combined with natural selection.


i love the 20% commitment. 80% wgaf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: