Bucher, Bernadette, Schmeckpeper, Karl, Matni, Nikolai, Daniilidis, Kostas

Good prediction is necessary for autonomous robotics to make informed decisions in dynamic environments. Improvements can be made to the performance of a given data-driven prediction model by using better sampling strategies when collecting training data. Active learning approaches to optimal sampling have been combined with the mathematically general approaches to incentivizing exploration presented in the curiosity literature via model-based formulations of curiosity. We present an adversarial curiosity method which maximizes a score given by a discriminator network. This score gives a measure of prediction certainty enabling our approach to sample sequences of observations and actions which result in outcomes considered the least realistic by the discriminator. We demonstrate the ability of our active sampling method to achieve higher prediction performance and higher sample efficiency in a domain transfer problem for robotic manipulation tasks. We also present a validation dataset of action-conditioned video of robotic manipulation tasks on which we test the prediction performance of our trained models.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that achieves sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints. Papers published at the Neural Information Processing Systems Conference.

Park, Sangdon, Bastani, Osbert, Matni, Nikolai, Lee, Insup

We propose an algorithm combining calibrated prediction and generalization bounds from learning theory to construct confidence sets for deep neural networks with PAC guarantees---i.e., the confidence set for a given input contains the true label with high probability. We demonstrate how our approach can be used to construct PAC confidence sets on ResNet for ImageNet, a visual object tracking model, and a dynamics model the half-cheetah reinforcement learning problem.

Tsiamis, Anastasios, Matni, Nikolai, Pappas, George J.

In this paper, we consider the task of designing a Kalman Filter (KF) for an unknown and partially observed autonomous linear time invariant system driven by process and sensor noise. To do so, we propose studying the following two step process: first, using system identification tools rooted in subspace methods, we obtain coarse finite-data estimates of the state-space parameters and Kalman gain describing the autonomous system; and second, we use these approximate parameters to design a filter which produces estimates of the system state. We show that when the system identification step produces sufficiently accurate estimates, or when the underlying true KF is sufficiently robust, that a Certainty Equivalent (CE) KF, i.e., one designed using the estimated parameters directly, enjoys provable sub-optimality guarantees. We further show that when these conditions fail, and in particular, when the CE KF is marginally stable (i.e., has eigenvalues very close to the unit circle), that imposing additional robustness constraints on the filter leads to similar sub-optimality guarantees. We further show that with high probability, both the CE and robust filters have mean prediction error bounded by $\tilde O(1/\sqrt{N})$, where $N$ is the number of data points collected in the system identification step. To the best of our knowledge, these are the first end-to-end sample complexity bounds for the Kalman Filtering of an unknown system.

Fattahi, Salar, Matni, Nikolai, Sojoudi, Somayeh

In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as applied to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near-linear time complexity, both measured with respect to the dimension of the system. In particular, we provide sharp end-to-end guarantees on the stability and the performance of the designed distributed controller and prove that for sparse systems, the number of samples needed to guarantee robust and near optimal performance of the designed controller can be significantly smaller than the dimension of the system. Finally, we show that the proposed optimization problem can be solved to global optimality with near-linear time complexity by iteratively solving a series of small quadratic programs.

Dean, Sarah, Matni, Nikolai, Recht, Benjamin, Ye, Vickie

Motivated by vision based control of autonomous vehicles, we consider the problem of controlling a known linear dynamical system for which partial state information, such as vehicle position, can only be extracted from high-dimensional data, such as an image. Our approach is to learn a perception map from high-dimensional data to partial-state observation and its corresponding error profile, and then design a robust controller. We show that under suitable smoothness assumptions on the perception map and generative model relating state to high-dimensional data, an affine error model is sufficiently rich to capture all possible error profiles, and can further be learned via a robust regression problem. We then show how to integrate the learned perception map and error model into a novel robust control synthesis procedure, and prove that the resulting perception and control loop has favorable generalization properties. Finally, we illustrate the usefulness of our approach on a synthetic example and on the self-driving car simulation platform CARLA.

Matni, Nikolai, Tu, Stephen

We provide a brief tutorial on the use of concentration inequalities as they apply to system identification of state-space parameters of linear time invariant systems, with a focus on the fully observed setting. We draw upon tools from the theories of large-deviations and self-normalized martingales, and provide both data-dependent and independent bounds on the learning rate. I. INTRODUCTION A key feature in modern reinforcement learning is the ability to provide high-probability guarantees on the finite-data/time behavior of an algorithm acting on a system. The enabling technical tools used in providing such guarantees are concentration of measure results, which should be interpreted as quantitative versions of the strong law of large numbers. This paper provides a brief introduction to such tools, as motivated by the identification of linear-time-invariant (LTI) systems.

Matni, Nikolai, Proutiere, Alexandre, Rantzer, Anders, Tu, Stephen

Machine and reinforcement learning (RL) are being applied to plan and control the behavior of autonomous systems interacting with the physical world -- examples include self-driving vehicles, distributed sensor networks, and agile robots. However, if machine learning is to be applied in these new settings, the resulting algorithms must come with the reliability, robustness, and safety guarantees that are hallmarks of the control theory literature, as failures could be catastrophic. Thus, as RL algorithms are increasingly and more aggressively deployed in safety critical settings, it is imperative that control theorists be part of the conversation. The goal of this tutorial paper is to provide a jumping off point for control theorists wishing to work on RL related problems by covering recent advances in bridging learning and control theory, and by placing these results within the appropriate historical context of the system identification and adaptive control literatures.

Fattahi, Salar, Matni, Nikolai, Sojoudi, Somayeh

This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guarantees on the accurate recovery of both the sparsity structure and the parameter values of the system. In particular, we show that the proposed estimator can correctly identify the sparsity pattern of the system matrices with high probability, provided that the length of the sample trajectory exceeds a threshold. Furthermore, we show that this threshold scales polynomially in the number of nonzero elements in the system matrices, but logarithmically in the system dimensions --- this improves on existing sample complexity bounds for the sparse system identification problem. We further extend these results to obtain sharp bounds on the $\ell_{\infty}$-norm of the estimation error and show how different properties of the system---such as its stability level and \textit{mutual incoherency}---affect this bound. Finally, an extensive case study on power systems is presented to illustrate the performance of the proposed estimation method.

Dean, Sarah, Mania, Horia, Matni, Nikolai, Recht, Benjamin, Tu, Stephen

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.