Collaborating Authors

Yampolskiy, Roman V.

On Controllability of AI Artificial Intelligence

The unprecedented progress in Artificial Intelligence (AI) [1-6], over the last decade, came alongside of multiple AI failures [7, 8] and cases of dual use [9] causing a realization [10] that it is not sufficient to create highly capable machines, but that it is even more important to make sure that intelligent machines are beneficial [11] for the humanity. This lead to the birth of the new subfield of research commonly known as AI Safety and Security [12] with hundreds of papers and books published annually on different aspects of the problem [13-31]. All such research is done under the assumption that the problem of controlling highly capable intelligent machines is solvable, which has not been established by any rigorous means. However, it is a standard practice in computer science to first show that a problem doesn't belong to a class of unsolvable problems [32, 33] before investing resources into trying to solve it or deciding what approaches to try. Unfortunately, to the best of our knowledge no mathematical proof or even rigorous argumentation has been published demonstrating that the AI control problem may be solvable, even in principle, much less in practice. Or as Gans puts it citing Bostrom: "Thusfar, AI researchers and philosophers have not been able to come up with methods of control that would ensure [bad] outcomes did not take place …" [34].

Human $\neq$ AGI Artificial Intelligence

Terms Artificial General Intelligence (AGI) and Human-Level Artificial Intelligence (HLAI) have been used interchangeably to refer to the Holy Grail of Artificial Intelligence (AI) research, creation of a machine capable of achieving goals in a wide range of environments. However, widespread implicit assumption of equivalence between capabilities of AGI and HLAI appears to be unjustified, as humans are not general intelligences. In this paper, we will prove this distinction.

Unpredictability of AI Artificial Intelligence

With increase in capabilities of artificial intelligence, over the last decade, a significant number of researchers have realized importance in creating not only capable intelligent systems, but also making them safe and secure [1-6]. Unfortunately, the field of AI Safety is very young, and researchers are still working to identify its main challenges and limitations. Impossibility results are well known in many fields of inquiry [7-13], and some have now been identified in AI Safety [14-16]. In this paper, we concentrate on a poorly understood concept of unpredictability of intelligent systems [17], which limits our ability to understand impact of intelligent systems we are developing and is a challenge for software verification and intelligent system control, as well as AI Safety in general. In theoretical computer science and in software development in general, many well-known impossibility results are well established, some of them are strongly related to the subject of this paper, for example: Rice's Theorem states that no computationally effective method can decide if a program will exhibit a particular nontrivial behavior, such as producing a specific output [18].

Personal Universes: A Solution to the Multi-Agent Value Alignment Problem Artificial Intelligence

Since the birth of the field of Artificial Intelligence (AI) researchers worked on creating ever capable machines, but with recent success in multiple subdomains of AI [1-7] safety and security of such systems and predicted future superintelligences [8, 9] has become paramount [10, 11]. While many diverse safety mechanisms are being investigated [12, 13], the ultimate goal is to align AI with goals, values and preferences of its users which is likely to include all of humanity. Value alignment problem [14], can be decomposed into three sub-problems, namely: personal value extraction from individual persons, combination of such personal preferences in a way, which is acceptable to all, and finally production of an intelligent system, which implements combined values of humanity. A number of approaches for extracting values [15-17] from people have been investigated, including inverse reinforcement learning [18, 19], brain scanning [20], value learning from literature [21], and understanding of human cognitive limitations [22]. Assessment of potential for success for particular techniques of value extraction is beyond the scope of this paper and we simply assume that one of the current methods, their combination, or some future approach will allow us to accurately learn values of given people.

Emergence of Addictive Behaviors in Reinforcement Learning Agents Artificial Intelligence

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a "drug" seed alongside the original "healthy" seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems.

Uploading Brain into Computer: Whom to Upload First? Artificial Intelligence

As we write this paper there is a team of researchers who are working toward the creation of a "Brain Simulation Platform", software that will map the human brain down to a minute level of detail (see This research has incredible implications for many scientific fields of study. The completion of this project will also represent the completion of the first two criteria set forth by Anders Sandberg and Nick Bostrom in their paper Whole Brain Emulation: A Roadmap [1], which would imply that we will be well on our way toward our first functional brain emulation. With the apparent eminence of, at least a simplistic version, of whole brain emulation, we must begin to consider some implications for the future. The goal of whole brain emulation is the eventual use of the technology to emulate a human mind.

Human Indignity: From Legal AI Personhood to Selfish Memes Artificial Intelligence

Debates about rights are frequently framed around the concept of legal personhood, which is granted not just to human beings but also to some nonhuman entities, such as firms, corporations or governments. Legal entities, aka legal persons are granted certain privileges and responsibilities by the jurisdictions in which they are recognized, and many such rights are not available to nonperson agents. Attempting to secure legal personhood is often seen as a potential pathway to get certain rights and protections for animals [1], fetuses [2], trees, rivers [3] and artificially intelligent (AI) agents [4]. It is commonly believed that a court ruling or a legislative action is necessary to grant personhood to a new type of entity, but recent legal literature [5-8] suggests that loopholes in the current law may permit granting of legal personhood to currently existing AI/software without having to change the law or persuade any court.

A Psychopathological Approach to Safety Engineering in AI and AGI Artificial Intelligence

The complexity of dynamics in AI techniques is already approaching that of complex adaptive systems, thus curtailing the feasibility of formal controllability and reachability analysis in the context of AI safety. It follows that the envisioned instances of Artificial General Intelligence (AGI) will also suffer from challenges of complexity. To tackle such issues, we propose the modeling of deleterious behaviors in AI and AGI as psychological disorders, thereby enabling the employment of psychopathological approaches to analysis and control of misbehaviors. Accordingly, we present a discussion on the feasibility of the psychopathological approaches to AI safety, and propose general directions for research on modeling, diagnosis, and treatment of psychological disorders in AGI.

Unethical Research: How to Create a Malevolent Artificial Intelligence Artificial Intelligence

Cybersecurity research involves publishing papers about malicious exploits as much as publishing information on how to design tools to protect cyber-infrastructure. It is this information exchange between ethical hackers and security experts, which results in a well-balanced cyber-ecosystem. In the blooming domain of AI Safety Engineering, hundreds of papers have been published on different proposals geared at the creation of a safe machine, yet nothing, to our knowledge, has been published on how to design a malevolent machine. Availability of such information would be of great value particularly to computer scientists, mathematicians, and others who have an interest in AI safety, and who are attempting to avoid the spontaneous emergence or the deliberate creation of a dangerous AI, which can negatively affect human activities and in the worst case cause the complete obliteration of the human species. This paper provides some general guidelines for the creation of a Malevolent Artificial Intelligence (MAI).

Taxonomy of Pathways to Dangerous Artificial Intelligence

AAAI Conferences

In order to properly handle a dangerous Artificially Intelligent (AI) system it is important to understand how the system came to be in such a state. In popular culture (science fiction movies/books) AIs/Robots became self-aware and as a result rebel against humanity and decide to destroy it. While it is one possible scenario, it is probably the least likely path to appearance of dangerous AI. In this work, we survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI. To the best of our knowledge, this is the first attempt to systematically classify types of pathways leading to malevolent AI. Previous relevant work either surveyed specific goals/meta-rules which might lead to malevolent behavior in AIs (Özkural 2014) or reviewed specific undesirable behaviors AGIs can exhibit at different stages of its development (Turchin July 10 2015a, Turchin July 10, 2015b).