Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Uesato, Jonathan, Kumar, Ramana, Krakovna, Victoria, Everitt, Tom, Ngo, Richard, Legg, Shane

arXiv.org Artificial Intelligence 

If reinforcement learning (RL) agents are to have a large influence in society, it is essential that we have reliable mechanisms to communicate our preferences to these systems. In the standard RL paradigm, the role of communicating our preferences is played by the reward function. However, it may not be possible to restrict sufficiently general RL agents from modifying physical implementations of their reward function, or more generally tampering with whatever process produces inputs to the learning algorithm, instead of pursuing the intended goal. Our central concern is the tampering problem, which can be summarized as: How can we design agents that pursue a given objective when all feedback mechanisms for describing that objective are influenceable by the agent? As a simplified example, consider designing an automated personal assistant with the objective of being useful for its user.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found