All-Action Policy Gradient Methods: A Numerical Integration Approach
Petit, Benjamin, Amdahl-Culleton, Loren, Liu, Yao, Smith, Jimmy, Bacon, Pierre-Luc
–arXiv.org Artificial Intelligence
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous control tasks with nonlinear function approximation. Our results show improved performance and sample efficiency.
arXiv.org Artificial Intelligence
Oct-20-2019
- Country:
- North America > United States
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Hampshire County > Amherst (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Santa Clara County
- Massachusetts
- Europe
- Netherlands (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.54)
- Technology: