Structured Policy Iteration for Linear Quadratic Regulator

Park, Youngsuk, Rossi, Ryan A., Wen, Zheng, Wu, Gang, Zhao, Handong

arXiv.org Artificial Intelligence 

Linear quadratic regulator (LQR) is one of the This stochastic control has led to a wide class of fundamental most popular frameworks to tackle continuous machinery along the way, across theoretical analysis Markov decision process tasks. With its fundamental as well as tractable algorithms, where the model of theory and tractable optimal policy, LQR transition dynamic and cost function are known. On the has been revisited and analyzed in recent years, other hand, under the uncertain model of transition dynamics, in terms of reinforcement learning scenarios such reinforcement learning (RL) and data-driven approaches as the model-free or model-based setting. In this have achieved a great empirical success in recent paper, we introduce the Structured Policy Iteration years, from simulated game scenarios (Mnih et al., 2015; (S-PI) for LQR, a method capable of deriving Silver et al., 2016) to robot manipulation (Tassa et al., a structured linear policy. Such a structured 2012; Al Borno et al., 2012; Kumar et al., 2016). In recent policy with (block) sparsity or low-rank years, LQR in discrete time domain in particular, has can have significant advantages over the standard been revisited and analyzed under model uncertainty, not LQR policy: more interpretable, memoryefficient, only in theoretical perspective like regret bound or sample and well-suited for the distributed setting.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found