Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Neural Information Processing Systems 

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O( T) was shown, apart form logarithmic factors.