Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that achieves sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints. Papers published at the Neural Information Processing Systems Conference.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

Ibrahimi, Morteza, Javanmard, Adel, Van Roy, Benjamin

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of ${O}(\sqrt{T})$ was shown, apart form logarithmic factors. However, this bound scales exponentially with $p$, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of ${O}(p \sqrt{T})$, apart from logarithmic factors. In particular, our algorithm has an average cost of $(1+\eps)$ times the optimum cost after $T = \polylog(p) O(1/\eps^2)$. This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of $\eps$ times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks.

Mühlegg, Maximilian (Technische Universität München) | Holzapfel, Florian (Technische Universität München) | Chowdhary, Girish (Oklahoma State University)

Autonomous unmanned aerial systems (UAS) are envisioned to become increasingly utilized in commercial airspace. In order to be attractive for commercial applications, UAS are required to undergo a quick development cycle, ensure cost effectiveness and work reliably in changing environments. Learning based adaptive control systems have been proposed to meet these demands. These techniques promise more flexibility when compared with traditional linear control techniques. However, no consistent verification and validation (V&V) framework exists for adaptive controllers. The underlying purpose of the V&V processes in certifying control algorithms for aircraft is to build trust in a safety critical system. In the past, most adaptive control algorithms were solely designed to ensure stability of a model system and meet robustness requirements against selective uncertainties and disturbances. However, these assessments do not guarantee reliable performance of the real system required by the V&V process. The question arises how trust can be defined for learning based adaptive control algorithms. From our perspective, self-confidence of an adaptive flight controller will be an integral part of building trust in the system. The notion of self-confidence in the adaptive control context relates to the estimate of the adaptive controller in its capabilities to operate reliably, and its ability to foresee the need for taking action before undesired behaviors lead to a loss of the system. In this paper we present a pathway to a possible answer to the question of how self-confidence for adaptive controllers can be achieved. In particular, we elaborate how algorithms for diagnosis and prognosis can be integrated to help in this process.