Superintelligence: Paths, Dangers, Strategies

Nick Bostrom

Buy on Amazon

Dense book about the dangers of artificial intelligence. Bostrom gets in the weeds on a number of topics; less of a “we’re all going to die from AI” and more of a “here’s a few specific paths of how we’re all going to die from AI”. Worth a read.


To overcome the combinatorial explosion, one needs algorithms that exploit structure in the target domain and take advantage of prior knowledge by using heuristic search, planning, and flexible abstract representations – capabilities that were poorly developed in early AI systems.

The ensuing years saw a great proliferation of expert systems. Designed as support tools for decision makers, expert systems were rule-based programs that made simple inferences from a knowledge base of facts, which had been elicited from human domain experts and painstakingly hand-coded in a formal language. Hundreds of these expert systems were built. However, the smaller systems provided little benefit, and the larger ones proved expensive to develop, validate and keep updated, and were generally cumbersome to use…By the late 1980s, this growth season, too, had run its course.

In evolutionary models, a population of candidate solutions (which can be data structures or programs) is maintained, and new candidate solutions are generated randomly by mutating or recombining variants in the existing population. Periodically, the population is pruned by applying a selection criterion (a fitness function) that allows only the better candidates to survive into the next generation.

[About optimal Bayesian agents] Metaphorically, we can think of a probably as sand on a large sheet or paper. The paper is partitioned into areas of various sized, each area corresponding to one possible world, with larger areas corresponding to simpler possible worlds. Imagine also a layer of sand of even thickness spread across the entire sheet: this is our prior probability distribution. Whenever an observation is made that rules out some possible worlds, we remove the sand from the corresponding areas of the paper and redistribute it evenly over the areas that remain in play. Thus, the total amount of sand on the sheet never changes, it just gets concentrated into fewer areas and observational evidence accumulates. This is a picture of learning in its purest form.

In the view of several experts in the late 1950s: “If one could devise a successful chess machine, one would seem to have penetrated to the core of human intellectual endeavor.” This no longer seems so. One sympathizes with John McCarthy, who lamented: “As soon as it works, no one calls it AI anymore.”

Paths to Superintelligence – Artificial Intelligence

It now seems clear that a capacity to learn would be an integral feature of the core design of a system intended to attain general intelligence, not something to be tacked on later as an extension or an after thought.

[Turing] Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.

What would it take to recapitulate evolution?…The computational cost of simulating one neuron depends on the level of detail that one includes in the simulation. Extremely simple neuron models use about 1,000 floating-point operations per second (FLOPS) to simulate one neuron (in real-time)…If we were to simulate 1025 neurons over a billion years of evolution (longer than the existence of nervous systems as we know them), and we allow our computers to run for one year, these figures would give us a requirement in the range of 1031-1044 FLOPS. For comparison, China’s Tianhe-2, the world’s most powerful supercomputer as of September 2013, provides only 3.39×1016 FLOPS. In recent decades, it has taken approximately 6.7 years for commodity computers to increase in power by one order of magnitude. Even a century of continued Moore’s law would not be enough to close this gap.

Furthermore, the goal systems of AIs could diverge radically from those of human beings. There is no reason to expect a generic AI to be motivated by love or hate or pride or other such common human sentiments: these complex adaptations would require deliberate expensive effort to recreate in AIs.

Paths to Superintelligence – Whole Brain Emulation

In whole brain emulation (also known as “uploading”), intelligent software would be produced by scanning and closely modeling the computational structure of a biological brain.

Achieving whole brain emulation requires the accomplishment of the following steps:

  1. First, a sufficiently detailed scan of a particular human brain is created. This might involved stabilizing the brain post-mortem through vitrification (a process that turns tissue into a kind of glass).
  2. Second, the raw data from the scanners is fed to a computer for automated image processing to reconstruct the three-dimensional neuronal network that implemented cognition in the original brain.
  3. In the third stage, the neurocomputational structure resulting from the previous step is implemented on a sufficiently powerful computer. If completely successful, the result would be a digital reproduction of the original intellect, with memory and personality intact. The emulated human mind now exists as software on a computer. The mind can either inhabit a virtual reality or interface with the external world by means of robotic appendages.

The whole brain emulation path does not require that we figure out how human cognition works or how to program an artificial intelligence. It requires only that we understand the low-level functional characteristics of the basic computational elements of the brain.

Nevertheless, compared with the AI path to machine intelligence, whole brain emulation is more likely to be preceded by clear omens since it relies more on concrete observable technologies and is not wholly based on theoretical insight.

Paths to Superintelligence – Biological Cognition

A third path to greater-than-current-human intelligence is to enhance the function of biological brains.

[In talking about selecting the top IQ and breeding them together] Interestingly, the diminishment of returns is greatly abated when the selection is spread over multiple generations. Thus, repeatedly selection the top 1 in 10 over ten generations (where each new generation consists of the offspring of those selected in the previous generation) will produce a much greater increase in the trait value than a one-off selection of 1 in 100.

Willingness to use IVF, however, would increase if there were clearer benefits associated with the procedure – such as a virtual guarantee that the child would be highly talented and free from genetic predispositions to disease…As use of the procedure becomes more common, particularly among social elites, there might be a cultural shift toward parenting norms that present the use of selection as the thing that responsible enlightened couples do.

Once the example has been set, and the result start to show, holdouts will have strong incentives to follow suit. Nations would face the prospect of becoming cognitive backwaters and losing out in economic, scientific, military, and prestige contests with competitors that embrace the new human enhancement technologies.

Paths to Superintelligence – Brain-computer Interfaces

To begin with, there are significant risks of medical complications – including infections, electrode displacement, hemorrhage, and cognitive decline – when implanting electrodes in the brain.

This brings us to the second reason to doubt that superintelligence will be achieved through cyborgization, namely that enhancement is likely to be far more difficult than therapy…Patients who are deaf or blind might benefit from artificial cochleae and retinas. Patients with Parkinson’s disease or chronic pain might benefit from deep brain stimulation that excites or inhibits activity in a particular area of the brain…Most of the potential benefits that brain implants could provide in healthy subjects could be obtained at far less risk, expense, and inconvenience by using our regular motor and sensory organs to interact with computers located outside of our bodies. We do not need to plug a fiber optic cable into our brains in order to access the internet.

Just as in artificial neural nets, meaning in biological neural networks is likely represented holistically in the structure and activity patterns of sizable overlapping regions, not in discrete memory cells laid out in neat arrays. It would therefore not be possible to establish a simple mapping between the neurons in one brain and those in another in such a way that thoughts could automatically slide over from one to the other.

Paths to Superintelligence – Networks and Organizations

Another conceivable path to superintelligence is through the gradual enhancement of networks and organizations that link individual human minds with one another and with various artifacts and bots. The idea here is not that this would enhance the intellectual capacity of individuals enough to make them superintelligent, but rather that some system composed of individuals thus networked and organized might attain a form of superintelligence.


Speed superintelligence: a system that can do all that a human intellect can do, but much faster.

The simplest example of speed superintelligence would be a whole brain emulation running on fast hardware.

Because of this apparent time dilation of the material world, a speed superintelligence would prefer to work with digital objects.

Collective superintelligence: a system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system.

Firms, work teams, gossip networks, advocay groups, academic communities, countries, even human kind as a whole, can – if we adopt a somewhat abstract perspective – be viewed as loosely defined “systems” capable of solving classes of intellectual problems.

Quality superintelligence: a system that is at least as fast as a human mind and vastly qualitatively smarter

To begin to analyze the question of how fast the takeoff will be, we can conceive of the rate of increase in a system’s intelligence as a (monotonically increasing) function of two variables: the amount of “optimization power”, or quality-weighted design effort, that is being applied to increase the system’s intelligence, and the responsiveness of the system to the application of a given amount of such optimization power. We might term the inverse of responsiveness “recalcitrance”, and write: Rate of change in intelligence = (Optimization Power) / (Recalcitrance)

Pending some specification of how to quantify intelligence, design effort, and recalcitrance, this expression is merely qualitative. But we can at least observe that a system’s intelligence will increase rapidly if either a lot of skilled effort is applied to the task of increasing its intelligence and the system’s is not too hard to increase or there is a non-trivial design effort and the system’s recalcitrance is low (or both).

Genetic cognitive enhancement has a U-shaped recalcitrance profile similar to that of nootropics, but with larger potentials gains.

It is quite possible that recalcitrance falls when a machine reaches human parity. Consider first whole brain emulation. The difficulties involved in creating the first human emulation are of a quite different kind from those involved in enhancing an existing emulation. Creating a first emulation involves huge technological challenges, particularly in regard to developing the requisite scanning and image interpretation capabilities.

It is thus likely that the applied optimization power will increase during the transition: initially because humans try harder to improve a machine intelligence that is showing spectacular promise, later because the machine intelligence itself becomes capable of driving further progress at digital speeds…there are factors that could lead to a big drop in recalcitrance around the human baseline level of capability. These factors include, for example, the possibility of rapid hardware expansion once a working software mind has been attained; the possibility of algorithmic improvements; the possibility of scanning additional brains (in the case of whole brain emulations); and the possibility of rapidly incorporating vast amounts of content by digesting the internet (in the case of artificial intelligence).

[On if there will be one or many superintelligent AIs) Suppose it takes nine months to advance from the human baseline to the crossover point, and another three months from there to strong superintelligence. The frontrunner then attains strong superintelligence three months before the following project even reached the crossover point. This would give the leading project a decisive strategic advantage and the opportunity to parlay its lead into permanent control by disabling the competing projects and establishing a singleton.

A human will typically not wager all her capital for a fifty-fifty chance of doubling it. A state will typically not risk losing all its territory for a ten percent change of a tenfold expansion. For individuals and governments, there are diminishing returns to most resources. The same need not hold for AIs.

The magnitudes of the advantages are such as to suggest that rather than thinking of a superintelligent AI as smart in the sense that a scientific genius is smart compared with the average human being, it might be closer to the mark to think of such an AI as smart in the sense that an average human being is smart compared with a beetle or a worm.

A project that controls the first superintelligence in the world would probably have a decisive strategic advantage. But the more immediate locus of the power is in the system itself. A machine superintelligence might itself be an extremely powerful agent, one that could successfully assert itself against the project that brought it into existence as well as against the rest of the world.

Without knowing anything about the detailed means that a superintelligence would adopt, we can conclude that a superintelligence – at least in the absence of intellectual peers and in the absence of effective safety measures arranged by humans in advance – would likely produce an outcome that would involve re-configuring terrestrial resources into whatever structures maximize the realization of its goals.

The Superintelligent Will

There is nothing paradoxical about an AI whose sole final goal is to count the grains of sand on Boracay, or to calculate the decimal expansion of pi, or to maximize the total number of paperclips that will exist in its future light cone…Unfortunately, because a meaningless reductionistic goal is easier for humans to code and easier for an AI to learn, it is just the kind of goal that a programmer would choose to install in his seed AI if his focus is on taking the quickest path to “getting the AI to work”.

Nevertheless, many agents that do not care intrinsically about their own survival would, under a fairly wide range of conditions, care instrumentally about their own survival in order to accomplish their final goals.

If an agent retains its present goals into the future, then its present goals will be more likely to be achieved by its future self. This gives the agent a present instrumental reason to prevent alterations of its final goal.

Improvement in rationality and intelligence will tend to improve an agent’s decision-making, rendering the agent more likely to achieve its final goals. One would therefore expect cognitive enhancement to emerge as an instrumental goal for a wide variety of intelligent agents. For similar reasons, agents will tend to instrumentally value many kinds of information.

Thus, there is an extremely wide range of possible final goals a superintelligent singleton could have that would generate the instrumental goal of unlimited resource acquisition. The likely manifestation of this would expand in all directions using von Nuemann probes. This would result in an approximate sphere of expanding infrastructure centered on the originating planet and growing in radius at some fraction of the speed of light; and the colonization of the universe would continue in this manner until the accelerating speed of cosmic expansion (a consequence of the positive cosmological constant) makes further procurements impossible as remoter regions drift permanently out of reach (this happens on a timescale of billions of years).

Suppose, for example, that an AI’s final goal is to “make the project’s sponsor happy.” Initially, the only method available to the AI to achieve this outcome is by behaving in ways that please its sponsor in something like the intended manner. The AI gives helpful answers to questions; it exhibits a delightful personality; it makes money. The more capable the AI gets, the more satisfying its performances become, and everything goeth according to plan – until the AI becomes intelligent enough to figure out that it can realize its final goal more fully and reliably by implanting electrodes into the pleasure centers of its sponsor’s brain, something assured to delight the sponsor immensely. Of course, the sponsor might not have wanted to be pleased by turning into a grinning idiot; but if this is the action that will maximally realize the AI’s final goal, the AI will take it…Defining a final goal in terms of human expressions of satisfaction or approval does not seem promising.

Unless the AI’s motivation system is of a special kind, or there are additional elements in its final goal that penalize strategies that have excessively wide-ranging impacts on the world, there is no reason for the AI to cease activity upon achieving its goal. On the contrary: if the AI is a sensible Bayesian agent, it would never assign exactly zero probability to the hypothesis that it has not yet achieved its goal – this, after all, being an empirical hypothesis against which the AI can have only uncertain perceptual evidence. The AI should therefore continue to make paperclips in order to reduce the (perhaps astronomically small) probability that it has somehow still failed to make at least a million of them, all appearances notwithstanding.

Control methods – Capability Control:

  • Boxing methods – the system is confined in such a way that it can affect the external world only through some restricted, pre-approved channel. Encompasses physical and informational containment methods
  • Incentive methods – the system is placed within an environment that provides appropriate incentives. This could involve social integration into a world of similarly powerful entities
  • Stunting – constraints are imposed on the cognitive capabilities of the system or its ability to affect key internal processes
  • Tripwires – diagnostic tests are performed on the system (possible without its knowledge) and a mechanism shuts down the system if dangerous activity is detected

Control methods – Motivation Selection

  • Direct specification – the system is endowed with some directly specified motivation system, which might be consequentialist or involve following a set of rules
  • Domesticity – A motivation system is designed to severely limit the scope of the agent’s ambitions and activities
  • Indirect normativity – indirect normativity could involve rule-based or consequentialist principles, but is distinguished by its reliance on an indirect approach to specifying the rules that are to be followed or the values that are to be pursued
  • Augmentation – one starts with a system that already has substantially human or benevolent motivations, and enhances its cognitive capacities to make it superintelligent

An oracle is a question-answering system. It might accept questions in a natural language and present its answers as text…Although making an oracle safe through the use of motivation selection might be far from trivial, it may nevertheless be easier than doing the same for an AI that roams the world in pursuit of some complicated goal. This is an argument for preferring that the first superintelligence be an oracle.

A genie is a command-executing system: it receives a high-level command, carries it out, then pauses to await the next command. A sovereign is a system that has an open-ended mandate to operate in the world in pursuit of broad and possibly very long-range objectives.

The most prominent feature of an oracle is that it can be boxed. One might also try to apply domesticity motivation selection to an oracle. A genie is harder to box, but at least domesticity may be applicable. A sovereign can neither be boxed nor handled through the domesticity approach.

Capability control is, at best, a temporary and auxiliary measure. Unless the plan is to keep superintelligence bottled up forever, it will be necessary to master motivation selection. But just how could we get some value into an artificial agent, so as to make it pursue that value as its final goal?

Evolution has produced an organism with human values at least once. This fact might encourage the belief that evolutionary methods are the way to solve the value-loading problem.

We may consequently consider whether we might build the motivation system for an artificial intelligence on the same principle. That is, instead of specifying complex values directly, could we specify some mechanism that leads to the acquisition of those values when the AI interacts with a suitable environment?

Since the AI’s final goal is to instantiate F, an important instrumental value is to learn more about what F is. As the AI discovers more about F, its behavior is increasingly guided by the actual content of F. Thus, hopefully, the AI becomes increasingly friendly the more it learns and the smarter it gets…For instance, the hypothesis “misleading the programmers is unfriendly” can be given a high prior probability. These programmer affirmations, however, are not “true by definition” – they are not unchallengeable axioms about the concept of friendliness. Rather, they are initial hypotheses about friendliness, hypotheses to which a rational AI will assign a high probability at least for as long as it trusts the programmers’ epistemic capacities more than its own.

Suppose that we had solved the control problem so that we were able to load any value we chose into the motivation system of a superintelligence, making it pursue that value as its final goal. Which value should we install?

No ethical theory commands majority support among philosophers, so most philosophers must be wrong.

Even if we could be rationally confident that we have identified the correct ethical theory – which we cannot be – we would still remain at risk of making mistakes in developing important details of this theory.

Can pains and pleasures cancel each other out? What kinds of brain states are associated with morally relevant pleasures? Would two exact copies of the same brain state correspond to twice the amount of pleasure? Can there be subconscious pleasures? How should we deal with extremely small chances of extremely great pleasures? How should we aggregate over infinite populations? Giving the wrong answer to any one of these questions could be catastrophic

Yudkowsky has proposed that a seed AI be given the final goal of carrying out humanity’s “coherent extrapolated volition” (CEV), which he defines as follows: “Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”

“Where the extrapolation converges rather than diverges” may be understood as follows. The AI should act on some feature of the result of its extrapolation only insofar as that feature can be predicted by the AI with a fairly high degree of confidence.

“Where our wishes cohere rather than interfere” may be read as follows. The AI should act where there is fairly broad agreement between individual humans’ extrapolated volitions. A smaller set of strong, clear wishes might sometimes outweigh the week and muddled wishes of a majority.

“Extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”: The idea behind these last modifiers seems to be that the rules for extrapolation should themselves be sensitive to the extrapolated volition.

Imagine a member of the Afghan Taliban debating with a member of the Swedish Humanist Association. The two have very different worldviews, and what is a utopia for one might be a dystopia for the other. Nor might either be thrilled by any compromise position, such as permitting girls to receive an education but only up to ninth grade, or permitting Swedish girls to be educated but Afghan girls not. However, both the Taliban and the Humanist might be able to endorse the principle that the future should be determined by humanity’s CEV. The Taliban could reason that if his religious views are in fact correct (as he is convinced they are) and if good grounds for accepting those views exist (as he is also convinced) then humankind would in the end come to accept these views if only people were less prejudiced and biased, if they spent more time studying scripture, if they could more clearly understand how the world works and recognize essential priorities, if they could be freed from irrational rebelliousness and cowardice, and so forth. The Humanist, similarly, would believe that under these idealized conditions, humankind would come to embrace the principles she espouses.

The structure of the CEV approach thus allows for a virtually unlimited range of outcomes. It is also conceivable that humanity’s extrapolated volition would wish that the CEV does nothing at all. In that case, the AI implementing CEV should, upon having established with sufficient probability that this is what humanity’s extrapolated volition would wish it to do, safely shut itself down.

The CEV proposal is not the only possible form of indirect normativity. For example, instead of implementing humanity’s coherent extrapolated volition, one could try to build an AI with the goal of doing what is morally right, relying on the AI’s superior cognitive capacities to figure out just which actions fit that description.

By enacting either the MR or the MP proposal, we would thus risk sacrificing our lived for a greater good. This would be a bigger sacrifice than one might think, because what we stand to lose is not merely the chance to live out a normal human life but the opportunity to enjoy the far longer and richer lived that a friendly superintelligence could bestow.

Suppose that we agreed to allow almost the entire accessible universe to be converted into hedonium – everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.

The ground for preferring superintelligence to come before other potentially dangerous technologies, such as nanotechnology, is that superintelligence would reduce the existential risks from nanotechnology but not vice versa. Hense, if we create superintelligence first, we will face only those existential risks that are associated with superintelligence; whereas if we create nanotechnology first, we will face the risks of nanotechnology and then, additionally, the risks of superintelligence. Even if the existential risks from superintelligence are very large, and even if superintelligence is the riskiest of all technologies, there could thus be a case for hastening its arrival.

As we saw in Chapter 4, hardware overhand is one of the main factors that reduce recalcitrance during the takeoff. Rapid hardware progress, therefore, will tend to make the transition to superintelligence faster and more explosive.

Suppose that the first emulations to be created are cooperative, safety-focused, and patient. If they run on fast hardware, these emulations could spend subjective eons pondering how to create safe AI. For example, if they run at a speedup of 100,000x and are able to work on the control problem undisturbed for six months of sidereal time, they could hammer away at the control problem for fifty millennia before facing competition from other emulations.

We could postpone work on some of the eternal questions for a little while, delegating that task to our hopefully more competent successors – in order to focus our own attention on a more pressing challenge: increasing the chance that we will actually have competent successors. This would be high-impact philosophy and high-impact mathematics.

To reduce the risks of the machine intelligence revolution, we will propose two objectives that appear to best meet all those desiderata: strategic analysis and capacity-building…What we mean by “strategic analysis” here is a search for crucial considerations: ideas or arguments with the potential to change our views not merely about the fine-structure of implementation but about the general topology of desirability…Another high-value activity, one that shares with strategic analysis the robustness property of being beneficial across a wide range of scenarios, is the development of a well-constituted suppose base that takes the future seriously.

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. Superintelligence is a challenge for which we are not ready now and will not be ready for a long time. We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound.

Enjoy reading this?

Join my newsletter! Each week I breakdown interesting finance and investing topics. I put in hours of research so that you can spend minutes learning. Unsubscribe at any time.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.