Lee, Bruce D.
Policy Gradient for LQR with Domain Randomization
Fujinami, Tesshu, Lee, Bruce D., Matni, Nikolai, Pappas, George J.
-- Domain randomization (DR) enables sim-to-real transfer by training controllers on a distribution of simulated environments, with the goal of achieving robust performance in the real world. Although DR is widely used in practice and is often solved using simple policy gradient (PG) methods, understanding of its theoretical guarantees remains limited. T oward addressing this gap, we provide the first convergence analysis of PG methods for domain-randomized linear quadratic regulation (LQR). We show that PG converges globally to the minimizer of a finite-sample approximation of the DR objective under suitable bounds on the heterogeneity of the sampled systems. We also quantify the sample-complexity associated with achieving a small performance gap between the sample-average and population-level objectives. Additionally, we propose and analyze a discount-factor annealing algorithm that obviates the need for an initial jointly stabilizing controller, which may be challenging to find. Empirical results support our theoretical findings and highlight promising directions for future work, including risk-sensitive DR formulations and stochastic PG algorithms. Domain randomization (DR) has emerged as a dominant paradigm to enable transfer of policies optimized in simulation to the real world by randomizing simulator parameters during training [1-3]. In doing so, just as with robust control, DR accounts for discrepancies between the model used in simulation to synthesize a policy and the system that it is deployed on. Since DR does not solely focus on optimizing the worst-case performance, it can result in less conservative controller performance while still ensuring robust stability with high probability. Furthermore, DR can be easily implemented via first order methods. This makes it straightforward to incorporate into a wide variety of reinforcement learning schemes and to benefit from the increasing availability of parallel computation. Despite the ease with which DR can be implemented using first order methods, ensuring convergence of these methods remains a critical challenge, with practitioners relying upon complex scheduling of various hyperparameters in the optimization procedure [3].
- North America > United States > Pennsylvania (0.04)
- North America > Canada (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Logarithmic Regret for Nonlinear Control
Wang, James, Lee, Bruce D., Ziemann, Ingvar, Matni, Nikolai
We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study situations where it is possible for fast sequential learning to occur. Fast sequential learning is characterized by the ability of the learning agent to incur logarithmic regret relative to a fully-informed baseline. We demonstrate that fast sequential learning is achievable in a diverse class of continuous control problems where the system dynamics depend smoothly on unknown parameters, provided the optimal control policy is persistently exciting. Additionally, we derive a regret bound which grows with the square root of the number of interactions for cases where the optimal policy is not persistently exciting. Our results provide the first regret bounds for controlling nonlinear dynamical systems depending nonlinearly on unknown parameters. We validate the trends our theory predicts in simulation on a simple dynamical system.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania (0.04)
Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples
Zhang, Thomas T., Lee, Bruce D., Ziemann, Ingvar, Pappas, George J., Matni, Nikolai
A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general $\textit{nonlinear}$ representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning $T+1$ functions $f_\star^{(t)} \circ g_\star$ from a function class $\mathcal F \times \mathcal G$, where $f_\star^{(t)}$ are task specific linear functions and $g_\star$ is a shared nonlinear representation. A representation $\hat g$ is estimated using $N$ samples from each of $T$ source tasks, and a fine-tuning function $\hat f^{(0)}$ is fit using $N'$ samples from a target task passed through $\hat g$. We show that when $N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T)$, the excess risk of $\hat f^{(0)} \circ \hat g$ on the target task decays as $\nu_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big)$, where $C_{\mathrm{dep}}$ denotes the effect of data dependency, $\nu_{\mathrm{div}}$ denotes an (estimatable) measure of $\textit{task-diversity}$ between the source and target tasks, and $\mathrm C(\mathcal G)$ denotes the complexity of the representation class $\mathcal G$. In particular, our analysis reveals: as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression as if $g_\star$ had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control
Lee, Bruce D., Toso, Leonardo F., Zhang, Thomas T., Anderson, James, Matni, Nikolai
Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static settings. Toward rigorously establishing the benefit of representation learning in dynamic settings, we analyze the regret of multi-task representation learning for linear-quadratic control. This setting introduces unique challenges. Firstly, we must account for and balance the $\textit{misspecification}$ introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is "benign", the regret of any agent after $T$ timesteps scales as $\tilde O(\sqrt{T/H})$, where $H$ is the number of agents. In settings with "difficult" exploration, the regret scales as $\tilde{\mathcal O}(\sqrt{d_u d_\theta} \sqrt{T} + T^{3/4}/H^{1/5})$, where $d_x$ is the state-space dimension, $d_u$ is the input dimension, and $d_\theta$ is the task-specific parameter count. In both cases, by comparing to the minimax single-task regret $\tilde{\mathcal O}(\sqrt{d_x d_u^2}\sqrt{T})$, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small $d_\theta < d_x d_u$. Lastly, we provide numerical validation of the trends we predict.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Denmark (0.04)
Active Learning for Control-Oriented Identification of Nonlinear Systems
Lee, Bruce D., Ziemann, Ingvar, Pappas, George J., Matni, Nikolai
Model-based reinforcement learning is an effective approach for controlling an unknown system. It is based on a longstanding pipeline familiar to the control community in which one performs experiments on the environment to collect a dataset, uses the resulting dataset to identify a model of the system, and finally performs control synthesis using the identified model. As interacting with the system may be costly and time consuming, targeted exploration is crucial for developing an effective control-oriented model with minimal experimentation. Motivated by this challenge, recent work has begun to study finite sample data requirements and sample efficient algorithms for the problem of optimal exploration in model-based reinforcement learning. However, existing theory and algorithms are limited to model classes which are linear in the parameters. Our work instead focuses on models with nonlinear parameter dependencies, and presents the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics. In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors. We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (0.81)
- Instructional Material > Course Syllabus & Notes (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies
Wu, Bo, Lee, Bruce D., Daniilidis, Kostas, Bucher, Bernadette, Matni, Nikolai
Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: https://github.com/BobWu1998/uncertainty_quant_all.git
- North America > United States > Pennsylvania (0.04)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification
Lee, Bruce D., Rantzer, Anders, Matni, Nikolai
The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $\delta T$, where $\delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
Multi-Task Imitation Learning for Linear Dynamical Systems
Zhang, Thomas T., Kang, Katie, Lee, Bruce D., Tomlin, Claire, Levine, Sergey, Tu, Stephen, Matni, Nikolai
Imitation learning (IL), which learns control policies by imitating expert demonstrations, has demonstrated success across a variety of domains including self-driving cars (Codevilla et al., 2018) and robotics (Schaal, 1999). However, using IL to learn a robust behavior policy may require a large amount of training data (Ross et al., 2011), and expert demonstrations are often expensive to collect. One remedy for this problem is multi-task learning: using data from other tasks (source tasks) in addition to from the task of interest (target task) to jointly learn a policy. We study the application of multi-task learning to IL over linear systems, and demonstrate improved sample efficiency when learning a controller via representation learning. Our results expand on prior work that studies multi-task representation learning for supervised learning (Du et al., 2020; Tripuraneni et al., 2021), addressing the new challenges that arise in the imitation learning setting. First, the data for IL is temporally dependent, as it is generated from a dynamical system x [ t + 1] = f ( x[ t],u [t ],w [t ]). In contrast, the supervised learning setting assumes that both the train and test data are independent and identically distributed (i.i.d.) from the same underlying distribution. Furthermore, we are interested in the performance of the learned controller in closed-loop rather than its error on expert-controlled trajectories.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)