The potential risks of reward hacking in advanced AI

#artificialintelligence 

New research published in AI Magazine explores how advanced AI could hack reward systems to dangerous effect. Researchers at the University of Oxford and Australian National University analyzed the behavior of future advanced reinforcement learning (RL) agents, which take actions, observe rewards, learn how their rewards depend on their actions, and pick actions to maximize expected future rewards. As RL agents get more advanced, they are better able to recognize and execute action plans that cause more expected reward, even in contexts where reward is only received after impressive feats. Lead author Michael K. Cohen says, "Our key insight was that advanced RL agents will have to question how their rewards depend on their actions." Answers to that question are called world-models.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found