Evolving Reinforcement Learning Algorithms

Co-Reyes, John D., Miao, Yingjie, Peng, Daiyi, Real, Esteban, Levine, Sergey, Le, Quoc V., Lee, Honglak, Faust, Aleksandra

Jan-8-2021–arXiv.org Artificial Intelligence

We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods. Designing new deep reinforcement learning algorithms that can efficiently solve across a wide variety of problems generally requires a tremendous amount of manual effort. Learning to design reinforcement learning algorithms or even small sub-components of algorithms would help ease this burden and could result in better algorithms than researchers could design manually. Our work might then shift from designing these algorithms manually into designing the language and optimization methods for developing these algorithms automatically. Reinforcement learning algorithms can be viewed as a procedure that maps an agent's experience to a policy that obtains high cumulative reward over the course of training. We formulate the problem of training an agent as one of meta-learning: an outer loop searches over the space of computational graphs or programs that compute the objective function for the agent to minimize and an inner loop performs the updates using the learned loss function. The objective of the outer loop is to maximize the training return of the inner loop algorithm.

algorithm, rl algorithm, training environment, (16 more...)

arXiv.org Artificial Intelligence

Jan-8-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan > Washtenaw County
    - Ann Arbor (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California > Santa Clara County
    - Mountain View (0.04)

Genre:
- Research Report (0.40)

Industry:
- Education (0.48)
- Leisure & Entertainment > Games
  - Computer Games (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found