Aligning AI With Shared Human Values
Hendrycks, Dan, Burns, Collin, Basart, Steven, Critch, Andrew, Li, Jerry, Song, Dawn, Steinhardt, Jacob
–arXiv.org Artificial Intelligence
We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv.org Artificial Intelligence
Sep-21-2020
- Country:
- Europe (0.14)
- North America (0.14)
- Genre:
- Research Report (0.81)
- Industry:
- Education (0.93)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Law (1.00)
- Technology: