The AI off-switch problem as a signalling game: bounded rationality and incomparability

benavoli, Alessio, facchini, Alessandro, Zaffalon, Marco

Feb-11-2025–arXiv.org Artificial Intelligence

The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.

artificial intelligence, machine learning, payoff, (20 more...)

arXiv.org Artificial Intelligence

Feb-11-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.14)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found