Xue, Haoru
Pre-training Auto-regressive Robotic Models with 4D Representations
Niu, Dantong, Sharma, Yuvan, Xue, Haoru, Biamby, Giscard, Zhang, Junyi, Ji, Ziteng, Darrell, Trevor, Herzig, Roei
This could potentially be attributed to the scarcity of large-scale, Foundation models pre-trained on massive unlabeled diverse robotic data, unlike the abundance of text and image datasets have revolutionized natural language data available for vision and language FMs. and computer vision, exhibiting remarkable generalization capabilities, thus highlighting the The lack of robotic data poses a significant bottleneck in importance of pre-training. Yet, efforts in robotics training foundation models that can effectively generalize have struggled to achieve similar success, limited across diverse robotic platforms and tasks. To overcome this by either the need for costly robotic annotations or limitation, several recent approaches (Xiao et al., 2022; Ye the lack of representations that effectively model et al., 2024) employ representation learning by pre-training the physical world. In this paper, we introduce on an abundance of human data, enabling transfer to robotic ARM4R, an Auto-regressive Robotic Model that systems. These approaches aim to recognize the inherent leverages low-level 4D Representations learned similarities between human and robot manipulation tasks from human video data to yield a better pretrained and exploit the vast repositories of human video data available robotic model. Specifically, we focus on on the internet. Yet, these approaches have not been utilizing 3D point tracking representations from able to demonstrate effective generalization to downstream videos derived by lifting 2D representations into tasks. In part, this is due to their representations lacking an 3D space via monocular depth estimation across understanding of the physical world (Zhen et al., 2024a), time. These 4D representations maintain a shared and therefore being less effective for robotics.
Agile Mobility with Rapid Online Adaptation via Meta-learning and Uncertainty-aware MPPI
Kalaria, Dvij, Xue, Haoru, Xiao, Wenli, Tao, Tony, Shi, Guanya, Dolan, John M.
Modern non-linear model-based controllers require an accurate physics model and model parameters to be able to control mobile robots at their limits. Also, due to surface slipping at high speeds, the friction parameters may continually change (like tire degradation in autonomous racing), and the controller may need to adapt rapidly. Many works derive a task-specific robot model with a parameter adaptation scheme that works well for the task but requires a lot of effort and tuning for each platform and task. In this work, we design a full model-learning-based controller based on meta pre-training that can very quickly adapt using few-shot dynamics data to any wheel-based robot with any model parameters, while also reasoning about model uncertainty. We demonstrate our results in small-scale numeric simulation, the large-scale Unity simulator, and on a medium-scale hardware platform with a wide range of settings. We show that our results are comparable to domain-specific well-engineered controllers, and have excellent generalization performance across all scenarios.
AnyCar to Anywhere: Learning Universal Dynamics Model for Agile and Adaptive Mobility
Xiao, Wenli, Xue, Haoru, Tao, Tony, Kalaria, Dvij, Dolan, John M., Shi, Guanya
Recent works in the robot learning community have successfully introduced generalist models capable of controlling various robot embodiments across a wide range of tasks, such as navigation and locomotion. However, achieving agile control, which pushes the limits of robotic performance, still relies on specialist models that require extensive parameter tuning. To leverage generalist-model adaptability and flexibility while achieving specialist-level agility, we propose AnyCar, a transformer-based generalist dynamics model designed for agile control of various wheeled robots. To collect training data, we unify multiple simulators and leverage different physics backends to simulate vehicles with diverse sizes, scales, and physical properties across various terrains. With robust training and real-world fine-tuning, our model enables precise adaptation to different vehicles, even in the wild and under large state estimation errors. In real-world experiments, AnyCar shows both few-shot and zero-shot generalization across a wide range of vehicles and environments, where our model, combined with a sampling-based MPC, outperforms specialist models by up to 54%. These results represent a key step toward building a foundation model for agile wheeled robot control. We will also open-source our framework to support further research.
Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing
Xue, Haoru, Pan, Chaoyi, Yi, Zeji, Qu, Guannan, Shi, Guanya
Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by the theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by $13.4$ times and outperforms reinforcement learning (RL) policies by $50\%$ in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.
WROOM: An Autonomous Driving Approach for Off-Road Navigation
Kalaria, Dvij, Sharma, Shreya, Bhagat, Sarthak, Xue, Haoru, Dolan, John M.
Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flipping over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end reinforcement learning (RL) system for an autonomous vehicle in off-road environments using a custom-designed simulator in the Unity game engine. We warm-start the agent by imitating a rule-based controller and utilize Proximal Policy Optimization (PPO) to improve the policy based on a reward that incorporates Control Barrier Functions (CBF), facilitating the agent's ability to generalize effectively to real-world scenarios. The training involves agents concurrently undergoing domain-randomized trials in various environments. We also propose a novel simulation environment to replicate off-road driving scenarios and deploy our proposed approach on a real buggy RC car. Videos and additional results: https://sites.google.com/view/wroom-utd/home
Deep Learning for Genomics: A Concise Overview
Yue, Tianwei, Wang, Yuanxin, Zhang, Longxiang, Gu, Chunming, Xue, Haoru, Wang, Wenping, Lyu, Qi, Dun, Yujie
Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing
Xue, Haoru, Zhu, Edward L., Borrelli, Francesco
This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, linear, data-driven learning of the error dynamics. We conduct experiments in simulation, 1/10th scale hardware, and deployed the proposed LMPC on a full-scale autonomous race car used in the Indy Autonomous Challenge (IAC) with closed loop experiments at the Putnam Park Road Course in Indiana, USA. The results show that the proposed control policy exhibits improved robustness to parameter tuning and data scarcity. Incremental and safety-aware exploration toward the limit of handling and iterative learning of the vehicle dynamics in high-speed domains is observed both in simulations and experiments.
Spline-Based Minimum-Curvature Trajectory Optimization for Autonomous Racing
Xue, Haoru, Yue, Tianwei, Dolan, John M.
We propose a novel B-spline trajectory optimization method for autonomous racing. We consider the unavailability of sophisticated race car and race track dynamics in early-stage autonomous motorsports development and derive methods that work with limited dynamics data and additional conservative constraints. We formulate a minimum-curvature optimization problem with only the spline control points as optimization variables. We then compare the current state-of-the-art method with our optimization result, which achieves a similar level of optimality with a 90% reduction on the decision variable dimension, and in addition offers mathematical smoothness guarantee and flexible manipulation options. We concurrently reduce the problem computation time from seconds to milliseconds for a long race track, enabling future online adaptation of the previously offline technique.