Goto

Collaborating Authors

 Umenberger, Jack


Improved Sample Complexity of Imitation Learning for Barrier Model Predictive Control

arXiv.org Artificial Intelligence

Imitation learning has emerged as a powerful tool in machine learning, enabling agents to learn complex behaviors by imitating expert demonstrations acquired either from a human demonstrator or a policy computed offline [3, 11, 12, 13]. Despite its significant success, imitation learning often suffers from a compounding error problem: Successive evaluations of the approximate policy could accumulate error, resulting in out-of-distribution failures [3]. Recent results in imitation learning [31, 32, 34] have identified smoothness (i.e., Lipschitzness of the derivative of the optimal controller with respect to the initial state) and stability of the expert as two key properties that circumvent this issue, thereby allowing for end-to-end performance guarantees for the final learned controller. In this paper, our focus is on enabling such guarantees when the expert being imitated is a Model Predictive Controller (MPC), a powerful class of control algorithms based on solving an optimization problem over a receding prediction horizon [23]. In some cases, the solution to this multiparametric optimization problem, known as the explicit MPC representation [6], can be pre-computed. For instance, in our setup -- linear systems with polytopic constraints -- the optimal control input is a piecewise affine (and, hence, highly non-smooth) function of the state [6].


Smooth Model Predictive Control with Applications to Statistical Learning

arXiv.org Artificial Intelligence

Approximating complex state-feedback controllers by parametric deep neural network models is a straightforward and easy technique for reducing the computational overhead of complex control policies, particularly in the context of Model Predictive Control (MPC). Learning a feedback controller to imitate an MPC policy over a given state distribution can overcome the limitations of both the implicit (online) and explicit (offline) variants of MPC. Implicit MPC uses an iterative numerical solver to obtain the optimal solution, which can be intractable to do in real-time for high-dimensional systems with complex dynamics. Conversely, explicit MPC finds an offline formulation of the MPC controller via multi-parametric programming which can be quickly queried, but where the complexity of the explicit representation scales poorly in the problem dimensions. Imitation learning (i.e., finding a feedback controller which approximates and performs similarly to the MPC policy) can transcend these limitations by using the computationally expensive iterative numerical solver in an offline manner to learn a cheaply-queriable, approximate policy solely over the state distribution relevant to the control problem, thereby bypassing the need to store the exact policy representation over the entire state domain. For continuous control problems, where approximately optimal control inputs are sufficient to solve the task, imitation learning is a direct path toward computationally inexpensive controllers which solve difficult, high-dimensional control problems in real-time.


Stabilizing Dynamical Systems via Policy Gradient Methods

arXiv.org Machine Learning

Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering. In this paper, we provide a simple, model-free algorithm for stabilizing fully observed dynamical systems. While model-free methods have become increasingly popular in practice due to their simplicity and flexibility, stabilization via direct policy search has received surprisingly little attention. Our algorithm proceeds by solving a series of discounted LQR problems, where the discount factor is gradually increased. We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.


Robust exploration in linear quadratic reinforcement learning

Neural Information Processing Systems

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.


Learning convex bounds for linear quadratic control policy synthesis

Neural Information Processing Systems

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.


Robust exploration in linear quadratic reinforcement learning

arXiv.org Machine Learning

This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.


Nonlinear input design as optimal control of a Hamiltonian system

arXiv.org Machine Learning

We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior as trajectories from a certain Hamiltonian system, we transform the input design task into an optimal control problem. The method is illustrated via numerical examples, including MRI pulse sequence design.


Learning convex bounds for linear quadratic control policy synthesis

Neural Information Processing Systems

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.


Learning convex bounds for linear quadratic control policy synthesis

Neural Information Processing Systems

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.


Learning convex bounds for linear quadratic control policy synthesis

arXiv.org Machine Learning

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.