Absolutist AI

Jul-18-2023–arXiv.org Artificial Intelligence

Mitchell Barrington Center for AI Safety University of Michigan University of Southern California Abstract This paper argues that training AI systems with absolute constraints--which forbid certain acts irrespective of the amount of value they might produce--may make considerable progress on many AI safety problems in principle. First, it provides a guardrail for avoiding the very worst outcomes of misalignment: An AI attempting to commit mass murder might have correctly deduced that doing so maximizes expected value, but more likely, the system is severely misaligned. Second, it could prevent AIs from causing catastrophes for the sake of very valuable consequences, such as replacing humans with a much larger number of beings living at a higher welfare level. Third, it makes systems more corrigible, allowing creators to make corrective interventions in them, such as altering their objective functions or shutting them down. And fourth, it helps systems explore their environment more safely by prohibiting them from exploring especially dangerous acts. I offer a decision-theoretic formalization of an absolute constraints, improving on existing models in the literature, and use this model to prove some results about the training and behavior of absolutist AIs. I conclude by showing that, although absolutist AIs will not maximize expected value, they will not be susceptible to behave irrationally, and they will not (contra coherence arguments) face environmental pressure to become expected-value maximizers. Introduction Advanced AI systems are expected to be dangerous because of the opacity of their goals: We may know that they will effectively pursue their goals but fail to know what those goals are.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Jul-18-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.24)
  - New York (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.05)
    - Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.84)

Industry:
- Law (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found