Evaluating Agents without Rewards
Matusch, Brendon, Ba, Jimmy, Hafner, Danijar
–arXiv.org Artificial Intelligence
Reward Human Similarity solve challenging tasks in unknown environments. Objective Correlation Correlation However, manually crafting reward functions can be time consuming, expensive, and error prone to Task Reward 1.00 0.67 human error. Competing objectives have been Human Similarity 0.67 1.00 proposed for agents to learn without external Input Entropy 0.54 0.89 supervision, but it has been unclear how well they reflect task rewards or human behavior. To Information Gain 0.49 0.79 accelerate the development of intrinsic objectives, Empowerment 0.41 0.66 we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather Table 1: We computed Pearson correlation coefficients of than optimizing them online, and compare them each intrinsic objective with task reward and human similarity by analyzing their correlations. We study input across 3 Atari games and Minecraft from over 2 billion entropy, information gain, and empowerment time steps. The intrinsic objectives correlate more strongly across seven agents, three Atari games, and the 3D with human similarity than with task reward.
arXiv.org Artificial Intelligence
Dec-21-2020
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Research Report
- New Finding (0.68)
- Experimental Study (0.46)
- Research Report
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology: