Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

Vamplew, Peter, Smith, Benjamin J., Kallstrom, Johan, Ramos, Gabriel, Radulescu, Roxana, Roijers, Diederik M., Hayes, Conor F., Heintz, Fredrik, Mannion, Patrick, Libin, Pieter J. K., Dazeley, Richard, Foale, Cameron

Nov-24-2021–arXiv.org Artificial Intelligence

Specifically they present the reward-is-enough hypothesis that "Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment", and argue in favour of reward maximisation as a pathway to the creation of artificial general intelligence (AGI). While others have criticised this hypothesis and the subsequent claims [44,54,60,64], here we make the argument that Silver et al. have erred in focusing on the maximisation of scalar rewards. The ability to consider multiple conflicting objectives is a critical aspect of both natural and artificial intelligence, and one which will not necessarily arise or be adequately addressed by maximising a scalar reward. In addition, even if the maximisation of a scalar reward is sufficient to support the emergence of AGI, we contend that this approach is undesirable as it greatly increases the likelihood of adverse outcomes resulting from the deployment of that AGI. Therefore, we advocate that a more appropriate model of intelligence should explicitly consider multiple objectives via the use of vector-valued rewards. Our paper starts by confirming that the reward-is-enough hypothesis is indeed referring specifically to scalar rather than vector rewards (Section 2). In Section 3 we then consider limitations of scalar rewards compared to vector rewards, and review the list of intelligent abilities proposed by Silver et al. to determine which of these exhibit multi-objective characteristics. Section 4 identifies multi-objective aspects of natural intelligence (animal and human). Section 5 considers the possibility of vector rewards being internally derived by an agent in response to a global scalar reward.

artificial intelligence, machine learning, objective, (17 more...)

arXiv.org Artificial Intelligence

Nov-24-2021

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Oregon > Lane County > Eugene (0.14)

Genre:
- Research Report (1.00)

Industry:
- Government (0.68)
- Health & Medicine > Therapeutic Area
  - Neurology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning > Agents (1.00)