Faster Policy Learning with Continuous-Time Gradients
Ainsworth, Samuel, Lowrey, Kendall, Thickstun, John, Harchaoui, Zaid, Srinivasa, Siddhartha
We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.
Dec-11-2020
- Country:
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York (0.04)
- Massachusetts > Middlesex County
- Europe
- Genre:
- Research Report (0.50)
- Industry:
- Government > Military (0.46)
- Technology: