Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret
Cohen, Alon, Koren, Tomer, Mansour, Yishay
Optimal control theory dates back to the 1950s, and has been applied successfully to numerous real-worldengineering problems (e.g., Bermúdez and Martinez, 1994; Chen and Islam, 2005; Lenhart and Workman, 2007; Geering, 2007). Classical results in control theory pertain to asymptotic convergenceand stability of dynamical systems, and recently, there has been a renewed interest in such problems from a learning-theoretic perspective with a focus on finite-time convergence guarantees andcomputational tractability. Perhaps the most well-studied model in optimal control is Linear-Quadratic (LQ) control. In this model, both the state and the action are real-valued vectors. The dynamics of the environment are linear in both the state and action, and are perturbed by Gaussian noise; the cost is quadratic in the state and action vectors.
Feb-17-2019