The potential risks of reward hacking in advanced AI

Sep-15-2022, 03:10:27 GMT–#artificialintelligence

New research published in AI Magazine explores how advanced AI could hack reward systems to dangerous effect. Researchers at the University of Oxford and Australian National University analyzed the behavior of future advanced reinforcement learning (RL) agents, which take actions, observe rewards, learn how their rewards depend on their actions, and pick actions to maximize expected future rewards. As RL agents get more advanced, they are better able to recognize and execute action plans that cause more expected reward, even in contexts where reward is only received after impressive feats. Lead author Michael K. Cohen says, "Our key insight was that advanced RL agents will have to question how their rewards depend on their actions." Answers to that question are called world-models.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Sep-15-2022, 03:10:27 GMT

News Web Page

Add feedback

AI-Alerts:
- 2022 > 2022-09 > AAAI AI-Alert for Sep 20, 2022 (1.00)

Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre:
- Research Report (0.85)

Industry:
- Health & Medicine (0.36)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found