Aligning AI With Shared Human Values

Hendrycks, Dan, Burns, Collin, Basart, Steven, Critch, Andrew, Li, Jerry, Song, Dawn, Steinhardt, Jacob

Sep-21-2020–arXiv.org Artificial Intelligence

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

artificial intelligence, health & medicine, scenario, (20 more...)

arXiv.org Artificial Intelligence

Sep-21-2020

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)
- North America (0.14)

Genre:
- Research Report (0.81)

Industry:
- Education (0.93)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Law (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Reinforcement Learning (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found