The AI off-switch problem as a signalling game: bounded rationality and incomparability
benavoli, Alessio, facchini, Alessandro, Zaffalon, Marco
–arXiv.org Artificial Intelligence
The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
arXiv.org Artificial Intelligence
Feb-11-2025
- Country:
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.14)
- Portugal (0.14)
- Ireland > Leinster
- North America > United States
- Massachusetts (0.14)
- Oceania > Australia (0.14)
- Europe
- Genre:
- Research Report (0.40)
- Industry:
- Leisure & Entertainment > Games (0.46)
- Technology: