Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Uesato, Jonathan, Kumar, Ramana, Krakovna, Victoria, Everitt, Tom, Ngo, Richard, Legg, Shane

Nov-17-2020–arXiv.org Artificial Intelligence

If reinforcement learning (RL) agents are to have a large influence in society, it is essential that we have reliable mechanisms to communicate our preferences to these systems. In the standard RL paradigm, the role of communicating our preferences is played by the reward function. However, it may not be possible to restrict sufficiently general RL agents from modifying physical implementations of their reward function, or more generally tampering with whatever process produces inputs to the learning algorithm, instead of pursuing the intended goal. Our central concern is the tampering problem, which can be summarized as: How can we design agents that pursue a given objective when all feedback mechanisms for describing that objective are influenceable by the agent? As a simplified example, consider designing an automated personal assistant with the objective of being useful for its user.

agent, algorithm, incentive, (14 more...)

arXiv.org Artificial Intelligence

Nov-17-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.04)
    - Greater London > London (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found