Avoiding Side Effects in Complex Environments

Turner, Alexander Matt, Ratzlaff, Neale, Tadepalli, Prasad

Jun-11-2020–arXiv.org Artificial Intelligence

Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.

artificial intelligence, reinforcement learning, side effect, (17 more...)

arXiv.org Artificial Intelligence

Jun-11-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.49)
  - Representation & Reasoning > Agents (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found