Modeling AGI Safety Frameworks with Causal Influence Diagrams
Everitt, Tom, Kumar, Ramana, Krakovna, Victoria, Legg, Shane
–arXiv.org Artificial Intelligence
One of the primary goals of AI research is the development of artificial agents that can exceed human performance on a wide range of cognitive tasks, in other words, artificial general intelligence (AGI). Although the development of AGI has many potential benefits, there are also many safety concerns that have been raised in the literature [Bostrom, 2014; Everitt et al., 2018; Amodei et al., 2016]. Various approaches for addressing AGI safety have been proposed [Leike et al., 2018; Christiano et al., 2018; Irving et al., 2018; Hadfield-Menell et al., 2016; Everitt, 2018], often presented as a modification of the reinforcement learning (RL) framework, or a new framework altogether. Understanding and comparing different frameworks for AGI safety can be difficult because they build on differing concepts and assumptions. For example, both reward modeling [Leike et al., 2018] and cooperative inverse RL [Hadfield-Menell et al., 2016] are frameworks for making an agent learn the preferences of a human user, but what are the key differences between them?
arXiv.org Artificial Intelligence
Jun-20-2019
- Country:
- Europe > United Kingdom > England
- Oxfordshire > Oxford (0.14)
- Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England
- Genre:
- Research Report (0.50)