Pu, Ye
Towards Fast and Safety-Guaranteed Trajectory Planning and Tracking for Time-Varying Systems
Siriya, Seth, Chen, Mo, Pu, Ye
When deploying autonomous systems in unknown and changing environments, it is critical that their motion planning and control algorithms are computationally efficient and can be reapplied online in real time, whilst providing theoretical safety guarantees in the presence of disturbances. The satisfaction of these objectives becomes more challenging when considering time-varying dynamics and disturbances, which arise in real-world contexts. We develop methods with the potential to address these issues by applying an offline-computed safety guaranteeing controller on a physical system, to track a virtual system that evolves through a trajectory that is replanned online, accounting for constraints updated online. The first method we propose is designed for general time-varying systems over a finite horizon. Our second method overcomes the finite horizon restriction for periodic systems. We simulate our algorithms on a case study of an autonomous underwater vehicle subject to wave disturbances.
Non-Asymptotic Bounds for Closed-Loop Identification of Unstable Nonlinear Stochastic Systems
Siriya, Seth, Zhu, Jingge, Neลกiฤ, Dragan, Pu, Ye
We consider the problem of least squares parameter estimation from single-trajectory data for discrete-time, unstable, closed-loop nonlinear stochastic systems, with linearly parameterised uncertainty. Assuming a region of the state space produces informative data, and the system is sub-exponentially unstable, we establish non-asymptotic guarantees on the estimation error at times where the state trajectory evolves in this region. If the whole state space is informative, high probability guarantees on the error hold for all times. Examples are provided where our results are useful for analysis, but existing results are not.
Task-Oriented Koopman-Based Control with Contrastive Encoder
Lyu, Xubo, Hu, Hanyang, Siriya, Seth, Pu, Ye, Chen, Mo
We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, for the first time to the best of our knowledge, extends Koopman control from low to high-dimensional, complex nonlinear systems, including pixel-based tasks and a real robot with lidar observations. Code and videos are available \href{https://sites.google.com/view/kpmlilatsupp/}{here}.
Data-driven Predictive Tracking Control based on Koopman Operators
Wang, Ye, Yang, Yujia, Pu, Ye, Manzie, Chris
Constraint handling during tracking operations is at the core of many real-world control implementations and is well understood when dynamic models of the underlying system exist, yet becomes more challenging when data-driven models are used to describe the nonlinear system at hand. We seek to combine the nonlinear modeling capabilities of a wide class of neural networks with the constraint-handling guarantees of model predictive control (MPC) in a rigorous and online computationally tractable framework. The class of networks considered can be captured using Koopman operators, and are integrated into a Koopman-based tracking MPC (KTMPC) for nonlinear systems to track piecewise constant references. The effect of model mismatch between original nonlinear dynamics and its trained Koopman linear model is handled by using a constraint tightening approach in the proposed tracking MPC strategy. By choosing two Lyapunov functions, we prove that solution is recursively feasible and input-to-state stable to a neighborhood of both online and offline optimal reachable steady outputs in the presence of bounded modeling errors under mild assumptions. Finally, we demonstrate the results on a numerical example, before applying the proposed approach to the problem of reference tracking by an autonomous ground vehicle.
Stability Bounds for Learning-Based Adaptive Control of Discrete-Time Multi-Dimensional Stochastic Linear Systems with Input Constraints
Siriya, Seth, Zhu, Jingge, Neลกiฤ, Dragan, Pu, Ye
We consider the problem of adaptive stabilization for discrete-time, multi-dimensional linear systems with bounded control input constraints and unbounded stochastic disturbances, where the parameters of the true system are unknown. To address this challenge, we propose a certainty-equivalent control scheme which combines online parameter estimation with saturated linear control. We establish the existence of a high probability stability bound on the closed-loop system, under additional assumptions on the system and noise processes. Finally, numerical examples are presented to illustrate our results. Adaptive control (AC) is concerned with the design of controllers for dynamical systems whose model parameters are unknown.
MBVI: Model-Based Value Initialization for Reinforcement Learning
Lyu, Xubo, Li, Site, Siriya, Seth, Pu, Ye, Chen, Mo
Model-free reinforcement learning (RL) is capable of learning control policies for high-dimensional, complex robotic tasks, but tends to be data inefficient. Model-based RL and optimal control have been proven to be much more data-efficient if an accurate model of the system and environment is known, but can be difficult to scale to expressive models for high-dimensional problems. In this paper, we propose a novel approach to alleviate data inefficiency of model-free RL by warm-starting the learning process using model-based solutions. We do so by initializing a high-dimensional value function via supervision from a low-dimensional value function obtained by applying model-based techniques on a low-dimensional problem featuring an approximate system model. Therefore, our approach exploits the model priors from a simplified problem space implicitly and avoids the direct use of high-dimensional, expressive models. We demonstrate our approach on two representative robotic learning tasks and observe significant improvements in performance and efficiency, and analyze our method empirically with a third task.