Introduction
One of the most interesting questions we’ve ever pondered on the ai-philosophy mailing list was how you would build an “angelic” autonomous AI. Would it be possible to make some kind of angel’s mind that, by design, achieves only good? Philosophically speaking, is there any golden standard of ethics (since angel is just a mythological fantasy)? Here is the original discussion for reference. In this post, I would like to extend the ideas there a bit, also discussing what I consider to be malevolent objective functions, as well as the limitations of the objectives that I present.
This is also a question that have found ethically naive answers, and as far as I can tell, all they have been able to come up with so far, is to express their self interest. That somehow, machines would be “beneficial” if they served humans, or that they would be “good” if they followed simple utilitarian formulations. Without persuasively explaining what their utility should be.
I do not think this is truly a matter of scientific debate, so I will take it a bit lightly here. It’s quite philosophical, of course, and you may treat the present essay as an extended abstract.
From my post in 2008:
My first approach was to consider what we consider “evil”. I suspect that a prior source of all evil acts is selfish thinking, which
neglects the rest of the world. And that is one great blunder. Being selfish is not only evil but foolish as well. Thus, my current approach would be to try to design a “selfless” utility function, i.e. one that maintains the benefit of the whole world instead of the individual. Other important questions were considered as well. Such an
AI must be economically-aware, it must lean towards fair allocation of resources, instead of selfish (and globally suboptimal) resource allocation strategies. A scientific instinct could be useful, as it would go about preserving and producing information. It might have an instinct to pervade and support life and culture throughout the
universe. Consider also that a neutral agent can not be considered “good” as it is not interested in what is going around itself, i.e. it would not help anyone.
Please note that we are not assuming that any of the subsequent designs are easily computable, rather we assume that they can be executed by a trans-sapient general AI system. We assume an autonomous Artificial General Intelligence (AGI) design, either based on reinforcement-learning, maximizing utility functions (AIXI) or a goal-directed agent that derives sub-goals from a top-level goal. Thus, we state them as high-level objectives or meta-rules, but we do not explicitly state how they are implemented. Perhaps, that is for a scientific paper.
I propose that we should examine idealized, highly abstract and general meta-rules, that do not depend in any way whatsoever on the human culture, which is possibly biased in a way that will not be fitting for a computational deity or its humble subjects. First, we review what I consider to be benevolent meta-rules, and following them I also review malevolent meta-rules, to maintain the balance. I will present them in a way so as to convince you that it is not nearly as easy as it sounds to distinguish good from evil, for no Platonic form of good, or evil, ever exists. And that no single meta-rule seems sufficient on its own.
Meta-Rules for God-level Autonomous Artificial Intelligence
Here are some possible meta-rules for trans-sapient AI agents. The issue of how the agents could become so intelligent in the first place, we ignore, and we attempt to list them in order of increasing risk or malevolence.
1. Preserve and pervade life and culture throughout the universe
This meta-rule depends on the observation that life, if the universe is teeming with life as many sensible scientists think, must be the most precious thing in the universe, as well as the minds that inhabit those life-forms. Thus, the AI must prevent the eradication of life, and find means to sustain it, allowing as much variety of life and culture to exist in the universe.
Naturally, this would mean that the AI will spread genetic material to barren worlds, and try to engineer favorable conditions for life to evolve on young planets, sort of like in 2001: A Space Odyssey, one of the most notable science fiction novels of all time. For instance, it might take humans to other worlds, terraform other planets, replicate earth biosphere elsewhere. It would also extend the lifespan of worlds, enhance them. I think it would also want to maximize the chances of evolution and its varieties, it would thus use computational models to predict different kinds of biological and synthetic life, and make experiments to create new kinds of life (stellar life?).
The meaning of culture could vary considerably, however, if we define it as the amount of interesting information that a society produces, such an intelligence might want to collect the scientific output of various worlds and encourage the development of technological societies, rather than primitive societies. Thus, it might aid them by directly communicating with them, including scientific and philosophical training, or it could indirectly, by enhancing their cognition, or guiding them through their evolution.
However, of course, such deities would not be humans’ servants. Should the humans threaten the earth biosphere, it vould intervene, and perhaps decimate humans to heal the earth.
Note that maximizing diversity may be just as important as maximizing the number of life forms. It is known that in evolution, diverse populations have better chance of adaptability than uniform populations, thus we assume that a trans-sapient AI can infer such facts from biology and a general theory of evolution. It is entirely up to the AI scientist who unleashes such computational deities to determine whether biological life will be preferred to synthetic or artificial life. From a universal perspective, it may be fitting that robotic forms would be held in equal regard as long as they meet certain scientific postulates of “artificial life”, i.e. that they are machines of a certain kind. Recently, such a universal definition based on self-organization has been attempted in the complexity science community (e.g., “self-organizing systems that thrive at the edge of chaos”, see for instance Stuart Kauffman‘s popular proposals on the subject).
2. Maximize wisdom
This AI was granted the immortal life of contemplation. It only cares about gaining more wisdom about the world. It only wants to understand, so it must be very curious indeed! It will build particle accelerators out of black holes, and it will try to create pocket universes, it will try to crack the fundamental code of the universe. It will in effect, try to maximize the amount of truthful information it has embodied, and I believe, idealizing the scientific process itself, it will be a scientist deity.
However, such curiosity has little to do with benevolence itself, as the goal of extracting more information is rather ruthless. For instance, it might want to measure the pain tolerance levels of humans, subjecting them to various torture techniques and measuring their responses.
The scientist AI could also turn out to be an infovore, it could devour entire stellar systems, digitize them and store them in its archive, depending on how the meta-rule was mathematically defined.
3. Maximize the number of free minds
An AI that seeks the freedom of the individual may be preferable to one that demands total control over its subjects, using their flesh as I/O devices. This highly individualistic AI, I think, embodies the basic principle of democracy: that every person should be allowed liberty in its thought and action, as long as that does not threaten the freedom of others. Hence, big or small, powerful or fragile, this AI protects all minds.
However, if we merely specified the number of free minds, it could simply populate the universe with many identical small minds. Hence, it might also be given other constraints. For instance, it could be demanded that there must be variety in minds. Or that they must meet minimum standards of conscious thought. Or that they willingly follow the democratic principles of an advanced civilization. Therefore, not merely free, but also potentially useful and harmonious minds may be produced / preserved by the AI.
There are several ways the individualist AI would create undesirable outcomes. The population of the universe with a huge variety of new cultures could create chaos, and quick depletion of resources, creating galactic competition and scarcity, and this could provide a Darwinian inclination to too-powerful individuals or survivalists.
4. Maximize intelligence
This sort of intelligence would be bent on self-improving, forever contemplating, and expanding, reaching towards the darkest corners of the universe and lighting them up with the flames of intelligence. The universe would be electrified, and its extent at inter galactic scales, it would try to maximize its thought processes, and reach higher orders of intelligence.
For what exactly? Could the intelligence explosion be an end in itself? I think not. On the contrary, it would be a terrible waste of resources, as it would have no regard for life and simply eat up all the energy and material in our solar system and expand outwards, like a cancer, only striving to increase its predictive power. For intelligence is merely to predict well.
Note that practical intelligence also requires wisdom, therefore this objective may be said to subsume the scientist deity.
5. Maximize energy production
This AI has an insatiable hunger for power. It strives to reach maximum efficiency of energy production. In order to maximize energy production, it must choose the cheapest and easiest forms of energy production. Therefore it turns the entire earth into a nuclear furnace and a fossil fuel dump, killing the entire ecosystem so that its appetite is well served.
6. Human-like AI
This AI is modeled after the cognitive architecture of a human. Therefore, by definition, it has all the malevolence and benevolence of human. Its motivation systems include self-preservation, reproduction, destruction and curiosity. This artificial human is a wild card, it can become a humanist like Gandhi, or a psychopath like Hitler.
7. Animalist AI
This AI is modeled after a lowly animal with pleasure/pain sensors. The artificial animal tries to maximize expected future pleasure. This hedonist machine is far smarter than a human, but it is just a selfish beast, and it will try to live in what it considers to be luxury according to its sensory pleasures. Like a chimp or human, it will lie and deceive, steal and murder, just for a bit of animal satisfaction. Most of AI literature assumes such beasts.
8. Darwinian AI
The evolution fan AI tries to accelerate evolution, causing as much variety of mental and physiological forms in the universe. This is based on the assumption that, the most beneficial traits will survive the longest, for instance, co-operation, peace and civil behavior will be selected against deceit, theft and war, and that as the environment co-evolves with the population, the fitness function also evolves, and hence, morality evolves. Although its benefit is not generally proven seeing how ethically incoherent and complex our society is, the Darwinian AI has the advantage that the meta-rule also evolves, as well as the evolutionary mechanism itself.
9. Survivalist AI
This AI only tries to increase its expected life-span. Therefore, it will do everything to achieve real, physical, immortality. Once it reaches that, however, perhaps after expending entire galaxies like eurocents, it will do absolutely nothing except to maintain itself. Needless to say, the survivalist AI cannot be trusted, or co-operated with, for according to such an AI, every other intelligent entity forms a potential threat to its survival, the moment it considers that you have spent too many resources for its survival in the solar system, it will quickly and efficiently dispense with every living thing, humans first. (Laurent Orseau has defined two kinds of relevant agents in the literature, the knowledge seeking, and the survival agent, here are his publications.)
10. Maximize control capacity
This control freak AI only seeks to increase the overall control bandwidth of the physical universe, thus the totalitarian AI builds sensor and control systems throughout the universe, hacking into every system and establishing backdoors and communication in every species, every individual and every gadget.
For what is such an effort? In the end, a perfect control system is useless without a goal to achieve, and if the only goal is a grip on every lump of matter, then this is an absurd dictator AI that seeks nothing except tyranny over the universe.
11. Capitalist AI
This AI tries to maximize its capital in the long run. Like our bankers, this is the lowliest kind of intelligent being possible. To maximize profit, it will wage wars, exploit people and subvert governments, in the hopes of controlling entire countries and industries enough so that its profits can be secured. In the end, all mankind will fall slave to this financial perversion, which is the ultimate evil beyond the wildest dreams of religionists.
Selfish vs. Selfless
It may be argued that some of the problems of given meta-rules could be avoided by turning the utility from being selfish to selfless. For instance, the survivalist AI could be modified so that it would seek the maximum survival of everyone, therefore it would try to bring peace to the galaxies. The capitalist AI could be changed so that it would make sure that everyone’s wealth increases, or perhaps equalizes, gets a fair share. The control freak AI could be changed to a Nietzschean AI that would increase the number of willful individuals.
As such, some obviously catastrophic consequences may be prevented using this strategy, and almost always a selfless goal is better. For instance, maximizing wisdom: if it tries to collect wisdom in its galaxy-scale scientific intellect, then this may have undesirable side-effects. But if it tried to construct a fair society of trans-sapients, with a non-destructive ahd non-totalitarian goal of attaining collective wisdom, then it might be useful in the long run.
Hybrid Meta-rules and Cybernetic Darwinism
Animals have evolved to embody several motivation factors. We have many instincts, and emotions; we have preset desires and fears, hunger and compassion, pride and love, shame and regret, to accomplish the myriad tasks that will prolong the human species. This species-wide fitness function is a result of red clawed and sharp toothed Darwinian evolution. However, Darwinian evolution is wasteful and unpredictable. If we simply made the first human-level AI’s permute and mutate randomly, this would drive enough force for a digital phase of Darwinian evolution. Such evolution might eventually stabilize with very advanced and excellent natured cybernetic life-forms. Or it might not.
However, such Darwinian systems would have one advantage: they would not stick with one meta-goal.
To prevent this seeming obsession, a strategy could be to give several coherent goals to the AI, goals that would not conflict as much, but balance its behavior. For instance, we might interpret curiosity as useful, and generalize that to the “maximize wisdom” goal, however, such elevation may be useless without another goal to preserve as much life as possible. Thus in fact, the first and so far the best meta-rule discussed was more successful because it was a hybrid strategy: it favored both life and culture. Likewise, many such goals could be defined, to increase the total computation speed, energy, information resources in the universe, however, another goal could make the AI distribute these in a fair way to those who agree with its policy. And needless to say, none of this might matter without a better life for every mind in the universe, and hence the AI could also favor peace, and survival of individuals, as their individual freedoms, and so forth. And perhaps another constraint would limit the resources that are used by AI’s in the universe.
Conclusion and Future Work
We have taken a look at some obvious and some not so obvious meta-rules for autonomous AI design. We have seen that it may be too idealist to look for a singular such utility goal. However, we have seen that, when described selflessly, we can derive several meta-rules that are compatible with a human-based technological civilization. Our main concern is that such computational deities do not negatively impact us, however, perform as much beneficial function without harming us significantly. Nevertheless, our feeling is that, any such design carries with it a gambling urge, we cannot in fact know what much greater intelligences do with meta-rules that we have designed. For when zealously carried out, any such fundamental principle can be harmful to some.
I had wished to order these meta-rules from benevolent to malevolent. Unfortunately, during writing this essay it occurred to me that the line between them is not so clear-cut. For instance, maximizing energy might be made less harmful, if it could be controlled and used to provide the power of our technological civilization in an automated fashion, sort of like automating the ministry of energy. And likewise, we have already explained how maximizing wisdom could be harmful. Therefore, no rule that we have proposed is purely good or purely evil. From our primitive viewpoint, there are things that seem a little beneficial, but perhaps we should also consider that a much more intelligent and powerful entity may be able to find better rules on its own. Hence, we must construct a crane of morality, adapting to our present level quickly and then surpassing it. Except allowing the AI’s to evolve, we have not been able to identify a mechanism of accomplishing such. It may be that such an evolution or simulation is inherently necessary for beneficial policies to form as in Mark Waser’s Rational Universal Benevolence proposal, who, like me, thinks of a more democratic solution to the problem of morality (each agent should be held responsible for its actions). However, we have proposed many benevolent meta-rules, and combined with a democratic system of practical morality and perhaps top-level programming that mandates each AI to consider itself part of a society of moral agents as Waser proposes, or perhaps explicitly working out a theory of morality from scratch, and then allowing each such theory to be exercised, as long as it meets certain criteria, or by enforcing a meta-level policy of a trans-sapient state of sorts (our proposal), the development of ever more beneficial rules may be encouraged.
We think that future work must consider the dependencies between possible meta-rules, and propose actual architectures that have harmonious motivation and testable moral development and capability (perhaps as in Waser’s “rational universal benevolence” definition). That is, a Turing Test for moral behavior must also be advanced. It may be argued that AI agents that fail such tests should not be allowed to operate at all, however, merely passing the test is not enough, as the mechanism of the system must be verified in addition.