Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Uesato, Jonathan, Kumar, Ramana, Krakovna, Victoria, Everitt, Tom, Ngo, Richard, Legg, Shane
–arXiv.org Artificial Intelligence
If reinforcement learning (RL) agents are to have a large influence in society, it is essential that we have reliable mechanisms to communicate our preferences to these systems. In the standard RL paradigm, the role of communicating our preferences is played by the reward function. However, it may not be possible to restrict sufficiently general RL agents from modifying physical implementations of their reward function, or more generally tampering with whatever process produces inputs to the learning algorithm, instead of pursuing the intended goal. Our central concern is the tampering problem, which can be summarized as: How can we design agents that pursue a given objective when all feedback mechanisms for describing that objective are influenceable by the agent? As a simplified example, consider designing an automated personal assistant with the objective of being useful for its user.
arXiv.org Artificial Intelligence
Nov-17-2020
- Country:
- North America > United States (0.04)
- Europe > United Kingdom
- England
- Oxfordshire > Oxford (0.04)
- Greater London > London (0.04)
- England
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (1.00)
- Technology: