AITopics | Umenberger, Jack

Improved Sample Complexity of Imitation Learning for Barrier Model Predictive Control

Pfrommer, Daniel, Padmanabhan, Swati, Ahn, Kwangjun, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zakaria, Jadbabaie, Ali

arXiv.org Artificial IntelligenceOct-1-2024

Imitation learning has emerged as a powerful tool in machine learning, enabling agents to learn complex behaviors by imitating expert demonstrations acquired either from a human demonstrator or a policy computed offline [3, 11, 12, 13]. Despite its significant success, imitation learning often suffers from a compounding error problem: Successive evaluations of the approximate policy could accumulate error, resulting in out-of-distribution failures [3]. Recent results in imitation learning [31, 32, 34] have identified smoothness (i.e., Lipschitzness of the derivative of the optimal controller with respect to the initial state) and stability of the expert as two key properties that circumvent this issue, thereby allowing for end-to-end performance guarantees for the final learned controller. In this paper, our focus is on enabling such guarantees when the expert being imitated is a Model Predictive Controller (MPC), a powerful class of control algorithms based on solving an optimization problem over a receding prediction horizon [23]. In some cases, the solution to this multiparametric optimization problem, known as the explicit MPC representation [6], can be pre-computed. For instance, in our setup -- linear systems with polytopic constraints -- the optimal control input is a piecewise affine (and, hence, highly non-smooth) function of the state [6].

adj, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.00859

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Smooth Model Predictive Control with Applications to Statistical Learning

Ahn, Kwangjun, Pfrommer, Daniel, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zak, Jadbabaie, Ali

arXiv.org Artificial IntelligenceJun-2-2023

Approximating complex state-feedback controllers by parametric deep neural network models is a straightforward and easy technique for reducing the computational overhead of complex control policies, particularly in the context of Model Predictive Control (MPC). Learning a feedback controller to imitate an MPC policy over a given state distribution can overcome the limitations of both the implicit (online) and explicit (offline) variants of MPC. Implicit MPC uses an iterative numerical solver to obtain the optimal solution, which can be intractable to do in real-time for high-dimensional systems with complex dynamics. Conversely, explicit MPC finds an offline formulation of the MPC controller via multi-parametric programming which can be quickly queried, but where the complexity of the explicit representation scales poorly in the problem dimensions. Imitation learning (i.e., finding a feedback controller which approximates and performs similarly to the MPC policy) can transcend these limitations by using the computationally expensive iterative numerical solver in an offline manner to learn a cheaply-queriable, approximate policy solely over the state distribution relevant to the control problem, thereby bypassing the need to store the exact policy representation over the entire state domain. For continuous control problems, where approximately optimal control inputs are sufficient to solve the task, imitation learning is a direct path toward computationally inexpensive controllers which solve difficult, high-dimensional control problems in real-time.

artificial intelligence, lqr, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2306.01914

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)

Add feedback

Stabilizing Dynamical Systems via Policy Gradient Methods

Perdomo, Juan C., Umenberger, Jack, Simchowitz, Max

arXiv.org Machine LearningOct-12-2021

Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering. In this paper, we provide a simple, model-free algorithm for stabilizing fully observed dynamical systems. While model-free methods have become increasingly popular in practice due to their simplicity and flexibility, stabilization via direct policy search has received surprisingly little attention. Our algorithm proceeds by solving a series of discounted LQR problems, where the discount factor is gradually increased. We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.

artificial intelligence, lin, machine learning, (15 more...)

arXiv.org Machine Learning

2110.06418

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan

Neural Information Processing SystemsMar-19-2020, 03:02:33 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

artificial intelligence, machine learning, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Learning convex bounds for linear quadratic control policy synthesis

Umenberger, Jack, Schön, Thomas B.

Neural Information Processing SystemsFeb-14-2020, 20:43:36 GMT

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

artificial intelligence, linear quadratic control policy synthesis, machine learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan

arXiv.org Machine LearningJun-4-2019

This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

artificial intelligence, exploration, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1906.01584

Country: Europe > Sweden (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Nonlinear input design as optimal control of a Hamiltonian system

Umenberger, Jack, Schön, Thomas B.

arXiv.org Machine LearningMar-6-2019

We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior as trajectories from a certain Hamiltonian system, we transform the input design task into an optimal control problem. The method is illustrated via numerical examples, including MRI pulse sequence design.

artificial intelligence, input design, optimization problem, (19 more...)

arXiv.org Machine Learning

1903.0225

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Learning convex bounds for linear quadratic control policy synthesis

Umenberger, Jack, Schön, Thomas B.

Neural Information Processing SystemsDec-31-2018

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback

Learning convex bounds for linear quadratic control policy synthesis

Umenberger, Jack, Schön, Thomas B.

Neural Information Processing SystemsDec-31-2018

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

bayesian inference, denote, optimization problem, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback

Learning convex bounds for linear quadratic control policy synthesis

Umenberger, Jack, Schön, Thomas B.

arXiv.org Machine LearningJun-1-2018

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

bayesian inference, denote, optimization problem, (19 more...)

arXiv.org Machine Learning

1806.00319

Country: