Leveraging Procedural Generation to Benchmark Reinforcement Learning

Cobbe, Karl, Hesse, Christopher, Hilton, Jacob, Schulman, John

Dec-3-2019–arXiv.org Machine Learning

This evidence raises the possibility that overfitting pervades classic benchmarks like the Arcade Learning Environment (ALE) (Bellemare et al., 2013), which has long served as a gold standard in RL. While the diversity between games in the ALE is one of the benchmark's greatest strengths, the low emphasis on generalization presents a significant drawback. Previous work has sought to alleviate overfitting in the ALE by introducing sticky actions (Machado et al., 2018) or by embedding natural videos as backgrounds (Zhang et al., 2018b), but these methods only superficially address the underlying problem -- that agents perpetually encounter near-identical states. For each game the question must be asked: are agents robustly learning a relevant skill, or are they approximately memorizing specific trajectories? There have been several investigations of generalization in RL (Farebrother et al., 2018; Packer et al., 2018; Zhang et al., 2018a; Lee et al., 2019), but progress has largely proved elusive. Arguably one of the principal setbacks has been the lack of environments well-suited to measure generalization.

agent, generalization, procedural generation control, (12 more...)

arXiv.org Machine Learning

Dec-3-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.85)
  - Neural Networks > Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found