CoinRun: Solving Goal Misgeneralisation

Armstrong, Stuart, Maranhão, Alexandre, Daniels-Koch, Oliver, Leask, Patrick, Gorman, Rebecca

Nov-1-2023–arXiv.org Artificial Intelligence

Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality. In this paper, we show how the ACE (Algorithm for Concept Extrapolation) agent can solve one of the key standard challenges in goal misgeneralisation: the CoinRun challenge. It uses no new reward information in the new environment. This points to how autonomous agents could be trusted to act in human interests, even in novel and critical situations.

artificial intelligence, coinrun, goal misgeneralisation

arXiv.org Artificial Intelligence

Nov-1-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found