Evaluating Agents without Rewards

Matusch, Brendon, Ba, Jimmy, Hafner, Danijar

Dec-21-2020–arXiv.org Artificial Intelligence

Reward Human Similarity solve challenging tasks in unknown environments. Objective Correlation Correlation However, manually crafting reward functions can be time consuming, expensive, and error prone to Task Reward 1.00 0.67 human error. Competing objectives have been Human Similarity 0.67 1.00 proposed for agents to learn without external Input Entropy 0.54 0.89 supervision, but it has been unclear how well they reflect task rewards or human behavior. To Information Gain 0.49 0.79 accelerate the development of intrinsic objectives, Empowerment 0.41 0.66 we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather Table 1: We computed Pearson correlation coefficients of than optimizing them online, and compare them each intrinsic objective with task reward and human similarity by analyzing their correlations. We study input across 3 Atari games and Minecraft from over 2 billion entropy, information gain, and empowerment time steps. The intrinsic objectives correlate more strongly across seven agents, three Atari games, and the 3D with human similarity than with task reward.

agent, objective, task reward, (12 more...)

arXiv.org Artificial Intelligence

Dec-21-2020

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.14)

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.46)

Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Cognitive Science (0.93)
  - Representation & Reasoning > Agents
    - Agent Societies (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found