Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind's work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution – for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine learning systems. The focus of these research efforts to date has been to account for shortcomings of datasets or supervised learning practices that can harm individuals.
Tampering problems, where an AI agent interferes with whatever represents or communicates its intended objective and pursues the resulting corrupted objective instead, are a staple concern in the AGI safety literature [Amodei et al., 2016, Bostrom, 2014, Everitt and Hutter, 2016, Everitt et al., 2017, Armstrong and O'Rourke, 2017, Everitt and Hutter, 2019, Armstrong et al., 2020]. Variations on the idea of tampering include wireheading, where an agent learns how to stimulate its reward mechanism directly, and the off-switch or shutdown problem, where an agent interferes with its supervisor's ability to halt the agent's operation. Many real-world concerns can be formulated as tampering problems, as we will show (§2.1, §4.1). However, what constitutes tampering can be tricky to define precisely, despite clear intuitions in specific cases. We have developed a platform, REALab, to model tampering problems.
Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model---handcrafted or machine acquired---is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent's actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects of the agent's actions is critical to improving the safety and reliability of autonomous systems. This emerging research topic is attracting increased attention due to the increased deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of negative side effects and the recent research efforts to address them. We identify key characteristics of negative side effects, highlight the challenges in avoiding negative side effects, and discuss recently developed approaches, contrasting their benefits and limitations. We conclude with a discussion of open questions and suggestions for future research directions.
Deep Reinforcement Learning (DRL) is an avenue of research in Artificial Intelligence (AI) that has received increasing attention within the research community in recent years, and is beginning to show potential for real-world application. DRL is one of the most promising routes towards developing more autonomous AI systems that interact with and take actions in complex real-world environments, and can more flexibly solve a range of problems for which we may not be able to precisely specify a correct ‘answer’. This could have substantial implications for people’s lives: for example by speeding up automation in various sectors, changing the nature and potential harms of online influence, or introducing new safety risks in physical infrastructure. In this paper, we review recent progress in DRL, discuss how this may introduce novel and pressing issues for society, ethics, and governance, and highlight important avenues for future research to better understand DRL’s societal implications. This article appears in the special track on AI and Society.
We show how norms can be used to guide a reinforcement learning agent towards achieving normative behavior and apply the same set of norms over different domains. Thus, we are able to: (1) provide a way to intuitively encode social knowledge (through norms); (2) guide learning towards normative behaviors (through an automatic norm reward system); and (3) achieve a transfer of learning by abstracting policies; Finally, (4) the method is not dependent on a particular RL algorithm. We show how our approach can be seen as a means to achieve abstract representation and learn procedural knowledge based on the declarative semantics of norms and discuss possible implications of this in some areas of cognitive science. Index T erms --Norms, Institutions, Automatic Reward Shaping, Transfer of Learning, Abstract Policies, Abstraction, State-Space Selection, Schema I. I NTRODUCTION In order to be accepted in human society, robots need to comply with human social norms.