Regret Minimization in Partially Observable Linear Quadratic Control

Lale, Sahin, Azizzadenesheli, Kamyar, Hassibi, Babak, Anandkumar, Anima

arXiv.org Machine Learning 

Controlling unknown discrete-time systems is a fundamenta l problem in adaptive control and reinforcement learning. In this problem, an agent interacts w ith an environment, with unknown dynamics, and aims to minimize the overall average regulati ng costs. To achieve this goal, the agent is required to explore the environment to gain a better understanding of the environment dynamics, which is often called system identification. The a gent then utilizes this understanding to design a set of improved controllers that simultaneously reduces the possible future costs and also enables the agent to explore the important and unknown a spects of the system. In recent decades, this challenging problem has been extensively stu died and resulted in a set of foundational steps to study the stability and asymptotic convergence to o ptimal controllers [Lai et al., 1982, Lai and Wei, 1987]. While asymptotic analyses set the ground for the design of optimal control, understanding the finite time behavior of adaptive algorith ms is critical for real-world applications. In practice, one might prefer an algorithm that guarantees b etter performance on a much shorter horizon. Recent developments in the fields of statistics and machine learning along with control theory [Van Der Vaart and Wellner, 1996, Peña et al., 2009, Lai et al., 1982] empowers us to not only advance the study of the asymptotic efficiency of algorithms b ut also to analyze their finite-time behavior [Fiechter, 1997, Abbasi-Yadkori and Szepesvári, 2011]. In partially observable linear quadratic control, if the ag ent, a priori, is handed the system dynamics, the optimal control/policy has a closed-form in t he presence of Gaussian disturbances.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found