Pfrommer, Daniel
The Pitfalls of Imitation Learning when Actions are Continuous
Simchowitz, Max, Pfrommer, Daniel, Jadbabaie, Ali
We study the problem of imitating an expert demonstrator in a discrete-time, continuous state-and-action control system. We show that, even if the dynamics are stable (i.e. contracting exponentially quickly), and the expert is smooth and deterministic, any smooth, deterministic imitator policy necessarily suffers error on execution that is exponentially larger, as a function of problem horizon, than the error under the distribution of expert training data. Our negative result applies to both behavior cloning and offline-RL algorithms, unless they produce highly "improper" imitator policies--those which are non-smooth, non-Markovian, or which exhibit highly state-dependent stochasticity--or unless the expert trajectory distribution is sufficiently "spread." We provide experimental evidence of the benefits of these more complex policy parameterizations, explicating the benefits of today's popular policy parameterizations in robot learning (e.g. action-chunking and Diffusion Policies). We also establish a host of complementary negative and positive results for imitation in control systems.
Improved Sample Complexity of Imitation Learning for Barrier Model Predictive Control
Pfrommer, Daniel, Padmanabhan, Swati, Ahn, Kwangjun, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zakaria, Jadbabaie, Ali
Imitation learning has emerged as a powerful tool in machine learning, enabling agents to learn complex behaviors by imitating expert demonstrations acquired either from a human demonstrator or a policy computed offline [3, 11, 12, 13]. Despite its significant success, imitation learning often suffers from a compounding error problem: Successive evaluations of the approximate policy could accumulate error, resulting in out-of-distribution failures [3]. Recent results in imitation learning [31, 32, 34] have identified smoothness (i.e., Lipschitzness of the derivative of the optimal controller with respect to the initial state) and stability of the expert as two key properties that circumvent this issue, thereby allowing for end-to-end performance guarantees for the final learned controller. In this paper, our focus is on enabling such guarantees when the expert being imitated is a Model Predictive Controller (MPC), a powerful class of control algorithms based on solving an optimization problem over a receding prediction horizon [23]. In some cases, the solution to this multiparametric optimization problem, known as the explicit MPC representation [6], can be pre-computed. For instance, in our setup -- linear systems with polytopic constraints -- the optimal control input is a piecewise affine (and, hence, highly non-smooth) function of the state [6].
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior
Block, Adam, Jadbabaie, Ali, Pfrommer, Daniel, Simchowitz, Max, Tedrake, Russ
Training dynamic agents from datasets of expert examples, known as imitation learning, promises to take advantage of the plentiful demonstrations available in the modern data environment, in an analogous manner to the recent successes of language models conducting unsupervised learning on enormous corpora of text [68, 71]. Imitation learning is especially exciting in robotics, where mass stores of pre-recorded demonstrations on Youtube [1] or cheaply collected simulated trajectories [43, 20] can be converted into learned robotic policies. For imitation learning to be a viable path toward generalist robotic behavior, it needs to be able to both represent and execute the complex behaviors exhibited in the demonstrated data. An approach that has shown tremendous promise is generative behavior cloning: fitting generative models, such as diffusion models [2, 19, 34], to expert demonstrations with pure supervised learning. In this paper, we ask: Under what conditions can generative behavior cloning imitate arbitrarily complex expert behavior? In this paper, we are interested in how algorithmic choices interface with the dynamics of the agent's environment to render imitation possible. The key challenge separating imitation learning from vanilla supervised learning is one of compounding error: when the learner executes the trained behavior in its environment, small mistakes can accumulate into larger ones; this in turn may bring the agent to regions of state space not seen during training, leading to larger-still deviations from intended trajectories.
Smooth Model Predictive Control with Applications to Statistical Learning
Ahn, Kwangjun, Pfrommer, Daniel, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zak, Jadbabaie, Ali
Approximating complex state-feedback controllers by parametric deep neural network models is a straightforward and easy technique for reducing the computational overhead of complex control policies, particularly in the context of Model Predictive Control (MPC). Learning a feedback controller to imitate an MPC policy over a given state distribution can overcome the limitations of both the implicit (online) and explicit (offline) variants of MPC. Implicit MPC uses an iterative numerical solver to obtain the optimal solution, which can be intractable to do in real-time for high-dimensional systems with complex dynamics. Conversely, explicit MPC finds an offline formulation of the MPC controller via multi-parametric programming which can be quickly queried, but where the complexity of the explicit representation scales poorly in the problem dimensions. Imitation learning (i.e., finding a feedback controller which approximates and performs similarly to the MPC policy) can transcend these limitations by using the computationally expensive iterative numerical solver in an offline manner to learn a cheaply-queriable, approximate policy solely over the state distribution relevant to the control problem, thereby bypassing the need to store the exact policy representation over the entire state domain. For continuous control problems, where approximately optimal control inputs are sufficient to solve the task, imitation learning is a direct path toward computationally inexpensive controllers which solve difficult, high-dimensional control problems in real-time.
The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
Pfrommer, Daniel, Simchowitz, Max, Westenbroek, Tyler, Matni, Nikolai, Tu, Stephen
A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains sample complexity polynomial in relevant problem parameters, and, by synthesizing locally stabilizing gains, overcomes exponential dependence in problem horizon. Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.