On the Search for Feedback in Reinforcement Learning
Wang, Ran, Parunandi, Karthikeya S., Yu, Dan, Kalathil, Dileep, Chakravorty, Suman
This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.
Feb-20-2020
- Country:
- Africa > Togo (0.04)
- North America > United States
- New York (0.04)
- Texas > Brazos County
- College Station (0.04)
- Massachusetts > Middlesex County
- Belmont (0.04)
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- Genre:
- Research Report (0.50)
- Industry:
- Energy (0.50)
- Transportation > Air (0.34)
- Technology: