Modeling AGI Safety Frameworks with Causal Influence Diagrams

Everitt, Tom, Kumar, Ramana, Krakovna, Victoria, Legg, Shane

Jun-20-2019–arXiv.org Artificial Intelligence

One of the primary goals of AI research is the development of artificial agents that can exceed human performance on a wide range of cognitive tasks, in other words, artificial general intelligence (AGI). Although the development of AGI has many potential benefits, there are also many safety concerns that have been raised in the literature [Bostrom, 2014; Everitt et al., 2018; Amodei et al., 2016]. Various approaches for addressing AGI safety have been proposed [Leike et al., 2018; Christiano et al., 2018; Irving et al., 2018; Hadfield-Menell et al., 2016; Everitt, 2018], often presented as a modification of the reinforcement learning (RL) framework, or a new framework altogether. Understanding and comparing different frameworks for AGI safety can be difficult because they build on differing concepts and assumptions. For example, both reward modeling [Leike et al., 2018] and cooperative inverse RL [Hadfield-Menell et al., 2016] are frameworks for making an agent learn the preferences of a human user, but what are the key differences between them?

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-20-2019

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England
  - Oxfordshire > Oxford (0.14)
  - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.54)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found