Overoptimization Failures and Specification Gaming in Multi-agent Systems

Oct-31-2018–arXiv.org Artificial Intelligence

In this paper, we show that even if artificial intelligence (AI) or machine learning (ML) systems are individually well-aligned with a goal, specific classes of over-optimization failures can create dynamics in multiparty systems that lead to new failure modes. Even specification of noncompetitive or cooperative goals does not necessarily provide any guarantee for the behavior of systems. By outlining how and why these multi-agent failures can occur, the paper hopes to spur system designers to explicitly consider these failure modes in designing systems, and to find approaches for mitigating them. When complex systems are optimized by a single agent, the representation of the system and of the goal used for optimization often lead to failures that can be surprising to the agent's designers. These various failure modes have been referred to as Goodhart's law [1, 2], Campbell's law [3], faulty reward functions [4], distributional shift [4], reward hacking [5], Proxyeconomics[6], and presumably many other terms. Such failure modes are the focus of a significant body of work in AI safety, and progress has been made.

agent, artificial intelligence, failure mode, (12 more...)

arXiv.org Artificial Intelligence

Oct-31-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.47)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games (1.00)
- Energy > Power Industry (0.68)
- Information Technology (0.68)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found