Eliezer's Q was, "What is the least impressive milestone you feel very, very confident will not be achieved in the next 2 years?" It's true that "least" will make it harder to come up with an example quickly. (Though "very, very confident" suggests that whatever you do come up with should almost never actually get solved in those 2 years.)
It's also true that it doesn't follow from "short-term prediction of x is hard" that "long-term prediction of y is harder". But there must be short-term patterns, trends, or observable generalizations of some kind that you're incredibly confident of, if you're even moderately confident about how those patterns will result in outcomes decades down the line, and if you're confident that the things you aren't accounting for will cancel out and be irrelevant to your final forecast. (Rather than multiplying over time so that your forecast gets less and less accurate as more surprising events chain together into the future.)
If those ground-level patterns aren't a confident understanding of when different weaker AI benchmarks will/won't be hit, then there should be a different set of patterns confident forecasters can point to that underlie their predictions. I think you'd need to be able to show a basically unparalleled genius for spotting and extrapolating from historical trends in the development of similar technologies, or general trends in economic or scientific productivity.
I think Eliezer's skepticism is partly coming from Phil Tetlock's research on expert forecasting. Quoting Superforecasting:
> Taleb, Kahneman, and I agree that there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious – ‘there will be conflicts’ – and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my EPJ research, the accuracy of expert predictions declined toward chance five years out. And yet, this sort of forecasting is common, even within institutions that should know better.
So while we can't rule out that making long-term predictions in AI is much easier than in other fields, there should be a strong presumption against that claim unless some kind of relevant extraordinarily rare gift for super-superprediction is shown somewhere or other. Like, I don't think it's impossible to make long-term predictions at all, but I think these generally need to be straightforward implications of really rock-solid general theories (e.g., in physics), not guesses about complicated social phenomena like 'when will such-and-such research community solve this hard engineering problem?' or 'when will such-and-such nation next go to war?'
(1) "Is general intelligence even a thing you can invent? Like, is there a single set of faculties underlying humans' ability to build software, design buildings that don't fall down, notice high-level analogies across domains, come up with new models of physics, etc.?"
(2) "If so, then does inventing general intelligence make it easy (unavoidable?) that your system will have all those competencies in fact?"
On 1, I don't see a reason to expect general intelligence to look really simple and monolithic once we figure it out. But one reason to think it's a thing at all, and not just a grab bag of narrow modules, is that humans couldn't have independently evolved specialized modules for everything we're good at, especially in the sciences.
We evolved to solve a particular weird set of cognitive problems; and then it turned out that when a relatively blind 'engineering' process tried to solve that set of problems through trial-and-error and incremental edits to primate brains, the solution it bumped into was also useful for innumerable science and engineering tasks that natural selection wasn't 'trying' to build in at all. If AGI turns out to be at all similar to that, then we should get a very wide range of capabilities cheaply in very quick succession. Particularly if we're actually trying to get there, unlike evolution.
On 2: Continuing with the human analogy, not all humans are genius polymaths. And AGI won't in-real-life be like a human, so we could presumably design AGI systems to have very different capability sets than humans do. I'm guessing that if AGI is put to very narrow uses, though, it will be because alignment problems were solved that let us deliberately limit system capabilities (like in https://intelligence.org/2017/02/28/using-machine-learning/), and not because we hit a 10-year wall where we can implement par-human software-writing algorithms but can't find any ways to leverage human+AGI intelligence to do other kinds of science/engineering work.
Those aren't exactly the questions I'm raising; I have no doubt that there exists some way to produce AGI. My concern is that it doesn't seem like the right question to ask, since history suggests that humans are much better at first building specialized devices, and when it comes to AI risk the only one that really matters is the first one built.
The thing I'm pointing to is that there are certain (relatively) specialized tasks like 'par-human biotech innovation' that require more or less the same kind of thinking that you'd need for arbitrary tasks in the physical world.
You may need exposure to different training data in order to go from mastering chemistry to mastering physics, but you don't need a fundamentally different brain design or approach to reasoning, any more than you need fundamentally different kinds of airplane to fly over one land mass versus another, or fundamentally different kinds of scissors to cut some kinds of hair versus other kinds. There's just a limit to how much specialization the world actually requires. And, e.g., natural selection tried to build humans to solve a much narrower range of tasks than we ended up being good at; so it appears that whatever generality humans possess over and above what we were selected for, must be an example of "the physical world just doesn't require that much specialized hardware/software in order for you to perform pretty well".
If all of that's true, then the first par-human biotech-innovating AI may initially lack competencies in other sciences, but it will probably be doing the right kind of thinking to acquire those competencies given relevant data. A lot of the safety risks surrounding 'AI that can do scientific innovation' come from the fact that:
- the reasoning techniques required are likely to work well in a lot of different domains; and
- we don't know how to limit the topics AI systems "want" to think about (as opposed to limiting what it can think about) even in principle.
E.g., if you can just build a system that's as good as a human at chemistry, but doesn't have the capacity to think about any other topics, and doesn't have the desire or capacity to develop new capacities, then that might be pretty safe if you exercise ordinary levels of caution. But in fact (for reasons I haven't really gone into here directly) I think that par-human chemistry reasoning by default is likely to come with some other capacities, like competence at software engineering and various forms of abstract reasoning (mathematics, long-term planning and strategy, game theory, etc.).
This constellation of competencies is the main thing I'm worried about re AI, particularly if developers don't have a good grasp on when and how their systems possess those competencies.
> The thing I'm pointing to is that there are certain (relatively) specialized tasks like 'par-human biotech innovation' that require more or less the same kind of thinking that you'd need for arbitrary tasks in the physical world.
The same way Go requires AGI, and giving semantic descriptions of photos requires AGI, and producing accurate translations requires AGI?
Be extremely cautious when you make claims like these. There are certainly tasks that seem to require being humanly smart in humanly ways, but the only things I feel I could convincingly argue being in that category involve modelling humans and having human judges. Biotech is a particularly strong counterexample, because not only is there no reason to believe our brand of socialized intelligence is particularly effective at it, but the only other thing that seems to have tried seems to have a much weaker claim at to intelligence yet far outperforms us: natural selection.
It's easy to look at our lineage, from ape-like creatures to early humans to modern civilization, and draw a curve on which you can place intelligence, and then call this "general" and the semi-intelligent tools we've made so far "specialized", but in many ways this is just an illusion. It's easier to see this if you ignore humans, and compare today's best AI against, say, chimps. In some regards a chimp seems like a general intelligence, albeit a weak one. It has high and low cognition, it has memory, it is goal-directed but flexible. Our AIs don't come close. But a chimp can't translate text or play Go. It can't write code, however narrow a ___domain. Our AIs can.
When I say I expect the first genuinely dangerous AI to be specialized, I don't mean that it will be specific to one task; even neural networks seem to generalize surprisingly well in that way. I mean it won't have the assortment of abilities that we consider fundamental to what we think of as intelligence. It might have no real overarching structure that allows it to plan or learn. It might have no metacognition, and I'd bet against it having the ability to convincingly model people. But maybe if you point it at a network and tell it to break things before heading to bed, you'd wake up to a world on fire.
I agree that "that's Pascal's wager!" isn't a reasonable response to someone arguing that, say, a 1% or 10% extinction risk is worth taking seriously. If you think the probability is infinitesimally small but that we should work on it just in case, then that's more like Pascal's wager.
I think the whole discussion thread has a false premise, though. The main argument for working on AGI accident risk is that it's high-probability, not that it's 'low-probability but not super low.'
Roughly: it would be surprising if we didn't reach AGI this century; it would be surprising if AGI exhibited roughly human levels of real-world capability (in spite of potential hardware and software improvements over the brain) rather than shooting past human-par performance; and it would be surprising if it were easy to get robustly good outcomes out of AI systems much smarter than humans, operating in environments too complex for it to be feasible to specify desirable v. undesirable properties of outcomes. "It's really difficult to make reliable predictions about when and how people will make conceptual progress on a tough technological challenge, and there's a lot of uncertainty" doesn't imply "the probability of catastrophic accidents is <10%" or even "the probability of catastrophic accidents is <50%".
Speaking very roughly, fuzzy logic applies in cases where the degree of truth of a statement is in question. Logical uncertainty (about decidable sentences) applies in cases where the sentences are definitely true or false, but we lack the resources to figure out which.
So, for example, fuzzy logic might help you quantify to what extent someone is "tall," where tallness admits of degrees rather than being binary. Or it might help you quantify to what extent a proof is "long." But it won't tell you how to calculate the subjective probability that there exists a proof of some theorem that is no more than 500 characters long in some fixed language. For that, you either need to find a proof, exhaustively demonstrate that no proof exists, or find a way to reason under logical uncertainty; and we haven't found any ways to use fuzzy logic to make progress on formalizing inference under logical uncertainty.
I agree that fuzzy logic wouldn't work for that purpose. But it addresses a formalism around the foundation of what probabilities are, which to what I could see was something you guys were doing as well. Just a thought.
As for actually addressing logical uncertainty and asymptotic convergence, I think subjective Bayesianism can be used in both cases. For example you write "the axioms of probability theory force you to put probability either 0 or 1 on those statements", which I think is simply not true. If I as an "expert" claimed that "in my experience there is a 70% chance of conjecture being correct", I can set "Prior(conjecture)=0.7".
Speaking as a MIRI employee, I can say that MIRI isn't trying to build AI that's mathematically provable to be safe. This misconception comes from the same place the "AI safety engineering, etc." post is speaking to -- the assumption that if (e.g.) we do work in provability theory to develop simple general models of reflective reasoning, then the finished product must fall within provability theory.
Smarter-than-human AI systems will presumably reason probabilistically, and all real-world safety guarantees are probabilistic. But theorem-proving can be useful in some contexts for making us quantitatively more confident in systems' behavior (see https://intelligence.org/2013/10/03/proofs/), and toy models of theorem-proving agents can also be useful just for helping shore up our understanding of the problem space and of the formal tools that are likely to be relevant down the line -- the analog of "calculus" in the rocket example.
Stuart Russell (co-author of AI:MA, one of MIRI's research advisors) argues on http://edge.org/conversation/the-myth-of-ai#26015 that AI systems with "the ability to make high-quality decisions" (where "quality refers to the expected outcome utility of actions taken" and the utility function is represented in the system's programmed decision criteria) raises two problems:
"1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
"2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task."
The first of those is what Bostrom calls "perverse instantiation" and Dietterich and Horvitz call the "Sorcerer's Apprentice" problem (http://cacm.acm.org/magazines/2015/10/192386-rise-of-concern...). The second of these is what Bostrom calls "convergent instrumental goals" and Omohundro calls "basic AI drives."
The first of these seems like a fairly obvious problem, if we think AI systems will ever be trusted with making important decisions. Human goals are complicated, and even a superintelligent system that can easily learn about our goals won't necessarily acquire the goals thereby. So solving the AI problem doesn't get us a solution to the goal specification problem for free.
The second of these also has some intuitive force; https://intelligence.org/?p=12234 shows Omohundro's idea can be stated formally, so it's not purely sci-fi. Averting the "Sorcerer's Apprentice" problem in full generality would mean averting this problem, since we'd then simply be able to give AI systems the right goals and let them go wild. Absent that, if AI systems become much more cognitively capable than humans, we'll probably need to actively work on some approach that violates Omohundro's assumptions (and the assumptions of the formalism above). Bostrom and MIRI both talk about a lot of interesting ideas along these lines.
What is "an AI system with the ability to make high-quality decisions"? Do automated derivative trading models count? Do systems which decide how much to bid on a RTB ad exchange count?
The first problem is not new. We have a similar problem with some corporations, for example.
"A sufficiently capable intelligent system" is as real as "sufficiently hostile aliens". It's hard to argue and reason about a fictional system with a assortment of properties picked by someone aiming to spreading fear.
The problems aren't completely unprecedented (else we'd have basically no knowledge about them), but they become more severe in the scenarios Bostrom/Russell/etc. are talking about.
I would say that the central concern is with notional systems that can form detailed, accurate models of the world and efficiently search through the space of policies that can be expected to produce a given outcome according to the model. This can be a recommender system that tells other agents what policies to adopt, or it can execute the policies itself.
If the search process through policies is sufficiently counter-intuitive and opaque to operator inspection, the "Sorcerer's Apprentice" problem becomes much more severe than it is in ordinary software. As the system becomes more capable, it can look increasingly safe and useful in its current context and yet remain brittle in the face of changes to itself and its environment. This is also where convergent instrumental goals become more concerning, because systems with imperfectly understood/designed policy selection criteria (introducing an element of randomness, from our perspective) seem likely to converge on adversarial policies due to the general fact of resource limitations.
There's no reason to think this kind of system is inevitable, but it's worth investigating how likely we are to be able to develop superhuman planning/decision agents, on what timescale, and whether there are any actions we could take in advance to make it possible to use such systems safely. At this point not enough research-hours have gone into this topic to justify any strong conclusions about whether we can (or can't) make much progress today.
I replied to this here: https://news.ycombinator.com/item?id=10721068. Short answer is that collaborations don't look unlikely, and we'll be able to say more when OpenAI's been up and running longer.
We're on good terms with the people at OpenAI, and we're very excited to see new AI teams cropping up with an explicit interest in making AI's long-term impact a positive one. Nate Soares is in contact with Greg Brockman and Sam Altman, and our teams are planning to spend time talking over the coming months.
It's too early to say what sort of relationship we'll develop, but I expect some collaborations. We're hopeful that the addition of OpenAI to this space will result in promising new AI alignment research in addition to AI capabilities research.
It's also true that it doesn't follow from "short-term prediction of x is hard" that "long-term prediction of y is harder". But there must be short-term patterns, trends, or observable generalizations of some kind that you're incredibly confident of, if you're even moderately confident about how those patterns will result in outcomes decades down the line, and if you're confident that the things you aren't accounting for will cancel out and be irrelevant to your final forecast. (Rather than multiplying over time so that your forecast gets less and less accurate as more surprising events chain together into the future.)
If those ground-level patterns aren't a confident understanding of when different weaker AI benchmarks will/won't be hit, then there should be a different set of patterns confident forecasters can point to that underlie their predictions. I think you'd need to be able to show a basically unparalleled genius for spotting and extrapolating from historical trends in the development of similar technologies, or general trends in economic or scientific productivity.
I think Eliezer's skepticism is partly coming from Phil Tetlock's research on expert forecasting. Quoting Superforecasting:
> Taleb, Kahneman, and I agree that there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious – ‘there will be conflicts’ – and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my EPJ research, the accuracy of expert predictions declined toward chance five years out. And yet, this sort of forecasting is common, even within institutions that should know better.
So while we can't rule out that making long-term predictions in AI is much easier than in other fields, there should be a strong presumption against that claim unless some kind of relevant extraordinarily rare gift for super-superprediction is shown somewhere or other. Like, I don't think it's impossible to make long-term predictions at all, but I think these generally need to be straightforward implications of really rock-solid general theories (e.g., in physics), not guesses about complicated social phenomena like 'when will such-and-such research community solve this hard engineering problem?' or 'when will such-and-such nation next go to war?'