With increase in capabilities of artificial intelligence, over the last decade, a significant number of researchers have realized importance in creating not only capable intelligent systems, but also making them safe and secure [1-6]. Unfortunately, the field of AI Safety is very young, and researchers are still working to identify its main challenges and limitations. Impossibility results are well known in many fields of inquiry [7-13], and some have now been identified in AI Safety [14-16]. In this paper, we concentrate on a poorly understood concept of unpredictability of intelligent systems , which limits our ability to understand impact of intelligent systems we are developing and is a challenge for software verification and intelligent system control, as well as AI Safety in general. In theoretical computer science and in software development in general, many well-known impossibility results are well established, some of them are strongly related to the subject of this paper, for example: Rice's Theorem states that no computationally effective method can decide if a program will exhibit a particular nontrivial behavior, such as producing a specific output .
Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers and technologists have revived the discussion about the potentially catastrophic risks entailed by such an entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of the major proposals for its containment. We argue that total containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) impossible. "Machines take me by surprise with great frequency. This is largely because I do not do sufficient calculation to decide what to expect them to do." Alan Turing (1950), Computing Machinery and Intelligence, Mind, 59, 433-460
Since the birth of the field of Artificial Intelligence (AI) researchers worked on creating ever capable machines, but with recent success in multiple subdomains of AI [1-7] safety and security of such systems and predicted future superintelligences [8, 9] has become paramount [10, 11]. While many diverse safety mechanisms are being investigated [12, 13], the ultimate goal is to align AI with goals, values and preferences of its users which is likely to include all of humanity. Value alignment problem , can be decomposed into three sub-problems, namely: personal value extraction from individual persons, combination of such personal preferences in a way, which is acceptable to all, and finally production of an intelligent system, which implements combined values of humanity. A number of approaches for extracting values [15-17] from people have been investigated, including inverse reinforcement learning [18, 19], brain scanning , value learning from literature , and understanding of human cognitive limitations . Assessment of potential for success for particular techniques of value extraction is beyond the scope of this paper and we simply assume that one of the current methods, their combination, or some future approach will allow us to accurately learn values of given people.
In order to properly handle a dangerous Artificially Intelligent (AI) system it is important to understand how the system came to be in such a state. In popular culture (science fiction movies/books) AIs/Robots became self-aware and as a result rebel against humanity and decide to destroy it. While it is one possible scenario, it is probably the least likely path to appearance of dangerous AI. In this work, we survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI. To the best of our knowledge, this is the first attempt to systematically classify types of pathways leading to malevolent AI. Previous relevant work either surveyed specific goals/meta-rules which might lead to malevolent behavior in AIs (Özkural 2014) or reviewed specific undesirable behaviors AGIs can exhibit at different stages of its development (Turchin July 10 2015a, Turchin July 10, 2015b).