Overoptimization Failures and Specification Gaming in Multi-agent Systems
–arXiv.org Artificial Intelligence
In this paper, we show that even if artificial intelligence (AI) or machine learning (ML) systems are individually well-aligned with a goal, specific classes of over-optimization failures can create dynamics in multiparty systems that lead to new failure modes. Even specification of noncompetitive or cooperative goals does not necessarily provide any guarantee for the behavior of systems. By outlining how and why these multi-agent failures can occur, the paper hopes to spur system designers to explicitly consider these failure modes in designing systems, and to find approaches for mitigating them. When complex systems are optimized by a single agent, the representation of the system and of the goal used for optimization often lead to failures that can be surprising to the agent's designers. These various failure modes have been referred to as Goodhart's law [1, 2], Campbell's law [3], faulty reward functions [4], distributional shift [4], reward hacking [5], Proxyeconomics[6], and presumably many other terms. Such failure modes are the focus of a significant body of work in AI safety, and progress has been made.
arXiv.org Artificial Intelligence
Oct-31-2018
- Country:
- North America > United States (0.47)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Energy > Power Industry (0.68)
- Information Technology (0.68)
- Technology: