On the Search for Feedback in Reinforcement Learning

Wang, Ran, Parunandi, Karthikeya S., Yu, Dan, Kalathil, Dileep, Chakravorty, Suman

Feb-20-2020–arXiv.org Machine Learning

This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.

algorithm, equation, noise, (17 more...)

arXiv.org Machine Learning

Feb-20-2020

arXiv.org PDF

Add feedback

Country:
- Africa > Togo (0.04)
- North America > United States
  - New York (0.04)
  - Texas > Brazos County
    - College Station (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
- Asia > China
  - Jiangsu Province > Nanjing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Energy (0.50)
- Transportation > Air (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found