Machine Theory of Mind and the Structure of Human Values

May-28-2025–arXiv.org Artificial Intelligence

Value learning is a crucial aspect of safe and ethical AI. This is primarily pursued by methods inferring human values from behaviour. However, humans care about much more than we are able to demonstrate through our actions. Consequently, an AI must predict the rest of our seemingly complex values from a limited sample. I call this the value generalization problem. In this paper, I argue that human values have a generative rational structure and that this allows us to solve the value generalization problem. In particular, we can use Bayesian Theory of Mind models to infer human values not only from behaviour, but also from other values. This has been obscured by the widespread use of simple utility functions to represent human values. I conclude that developing generative value-to-value inference is a crucial component of achieving a scalable machine theory of mind.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

May-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.14)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.73)
  - Cognitive Science (0.70)
  - Issues > Social & Ethical Issues (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found