Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Ilyas, Andrew, Engstrom, Logan, Santurkar, Shibani, Tsipras, Dimitris, Janoos, Firdaus, Rudolph, Larry, Madry, Aleksander

Nov-12-2018–arXiv.org Machine Learning

Deep reinforcement learning (RL) is at the core of some of the most publicized achievements of modern machine learning [19, 9, 1, 10]. To many, this framework embodies the promise of the real-world impact of machine learning. However, the deep RL toolkit has not yet attained the same level of engineering stability as, for example, the current deep (supervised) learning framework. Indeed, recent studies [3] demonstrate that state-of-the-art deep RL algorithms suffer from oversensitivity to hyperparameter choices, lack of consistency, and poor reproducibility. This state of affairs suggests that it might be necessary to reexamine the conceptual underpinnings of deep RL methodology. More precisely, the overarching question that motivates this work is: To what degree does the current practice of deep RL reflect the principles that informed its development? The specific focus of this paper is on deep policy gradient methods, a widely used class of deep RL algorithms. Our goal is to explore the extent to which state-of-the-art implementations of these methods succeed at realizing the key primitives of the general policy gradient framework.

artificial intelligence, direction 0, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Nov-12-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found