to

### Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.

### Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.

### Generating Adversarial Disturbances for Controller Verification

We consider the problem of generating maximally adversarial disturbances for a given controller assuming only blackbox access to it. We propose an online learning approach to this problem that adaptively generates disturbances based on control inputs chosen by the controller. The goal of the disturbance generator is to minimize regret versus a benchmark disturbance-generating policy class, i.e., to maximize the cost incurred by the controller as well as possible compared to the best possible disturbance generator in hindsight (chosen from a benchmark policy class). In the setting where the dynamics are linear and the costs are quadratic, we formulate our problem as an online trust region (OTR) problem with memory and present a new online learning algorithm (MOTR) for this problem. We prove that this method competes with the best disturbance generator in hindsight (chosen from a rich class of benchmark policies that includes linear-dynamical disturbance generating policies). We demonstrate our approach on two simulated examples: (i) synthetically generated linear systems, and (ii) generating wind disturbances for the popular PX4 controller in the AirSim simulator.

### Adaptive Regret for Control of Time-Varying Dynamics

We consider regret minimization for online control with time-varying linear dynamical systems. The metric of performance we study is adaptive policy regret, or regret compared to the best policy on {\it any interval in time}. We give an efficient algorithm that attains first-order adaptive regret guarantees for the setting of online convex optimization with memory. We also show that these first-order bounds are nearly tight. This algorithm is then used to derive a controller with adaptive regret guarantees that provably competes with the best linear dynamical controller on any interval in time. We validate these theoretical findings experimentally on (1) simulations of time-varying linear dynamics and disturbances, and (2) the non-linear inverted pendulum benchmark.

### Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

We study the problem of adaptive control in partially observable linear dynamical systems. We propose a novel algorithm, adaptive control online learning algorithm (AdaptOn), which efficiently explores the environment, estimates the system dynamics episodically and exploits these estimates to design effective controllers to minimize the cumulative costs. Through interaction with the environment, AdaptOn deploys online convex optimization to optimize the controller while simultaneously learning the system dynamics to improve the accuracy of controller updates. We show that when the cost functions are strongly convex, after $T$ times step of agent-environment interaction, AdaptOn achieves regret upper bound of $\text{polylog}\left(T\right)$. To the best of our knowledge, AdaptOn is the first algorithm which achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.