Defining Human Values for Value Learners
Sotala, Kaj (Machine Intelligence Research Institute)
Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.
Apr-12-2016
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California
- Alameda County > Berkeley (0.04)
- San Diego County > La Jolla (0.04)
- New York > New York County
- Europe
- Germany > Berlin (0.04)
- France (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Asia > India
- Odisha (0.04)
- North America > United States
- Technology: