Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

Nahian, Md Sultan Al, Frazier, Spencer, Harrison, Brent, Riedl, Mark

Apr-19-2021–arXiv.org Artificial Intelligence

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm. Value alignment is a property of intelligent agents wherein they solely pursue non-harmful behaviors or human-beneficial goals. We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward. The normative behavior reward is derived from a value-aligned prior model previously shown to classify text as normative or non-normative. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our value-alignment technique on three interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.

agent, artificial intelligence, computer game, (18 more...)

arXiv.org Artificial Intelligence

Apr-19-2021

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found