Pan, Wei
TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion
Mousa, Amr, Karavis, Neil, Caprio, Michele, Pan, Wei, Allmendinger, Richard
-- Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between privileged teacher and proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaption; lead to poor generalization in real-world scenarios. We propose T eacher-Aligned Representations via Contrastive Learning (T AR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "T eacher". Results showed accelerated training by 2 compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40% on average compared to existing methods. Open-source code and videos are available at https://ammousa.github.io/TARLoco/.
Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning
Apostolides, Georgios, Pan, Wei, Kober, Jens, Della Santina, Cosimo, Ding, Jiatao
Achieving controlled jumping behaviour for a quadruped robot is a challenging task, especially when introducing passive compliance in mechanical design. This study addresses this challenge via imitation-based deep reinforcement learning with a progressive training process. To start, we learn the jumping skill by mimicking a coarse jumping example generated by model-based trajectory optimization. Subsequently, we generalize the learned policy to broader situations, including various distances in both forward and lateral directions, and then pursue robust jumping in unknown ground unevenness. In addition, without tuning the reward much, we learn the jumping policy for a quadruped with parallel elasticity. Results show that using the proposed method, i) the robot learns versatile jumps by learning only from a single demonstration, ii) the robot with parallel compliance reduces the landing error by 11.1%, saves energy cost by 15.2% and reduces the peak torque by 15.8%, compared to the rigid robot without parallel elasticity, iii) the robot can perform jumps of variable distances with robustness against ground unevenness (maximal 4cm height perturbations) using only proprioceptive perception.
Safe Distributed Learning-Enhanced Predictive Control for Multiple Quadrupedal Robots
Zhan, Weishu, Liang, Zheng, Song, Hongyu, Pan, Wei
Quadrupedal robots exhibit remarkable adaptability in unstructured environments, making them well-suited for formation control in real-world applications. However, keeping stable formations while ensuring collision-free navigation presents significant challenges due to dynamic obstacles, communication constraints, and the complexity of legged locomotion. This paper proposes a distributed model predictive control framework for multi-quadruped formation control, integrating Control Lyapunov Functions to ensure formation stability and Control Barrier Functions for decentralized safety enforcement. To address the challenge of dynamically changing team structures, we introduce Scale-Adaptive Permutation-Invariant Encoding (SAPIE), which enables robust feature encoding of neighboring robots while preserving permutation invariance. Additionally, we develop a low-latency Data Distribution Service-based communication protocol and an event-triggered deadlock resolution mechanism to enhance real-time coordination and prevent motion stagnation in constrained spaces. Our framework is validated through high-fidelity simulations in NVIDIA Omniverse Isaac Sim and real-world experiments using our custom quadrupedal robotic system, XG. Results demonstrate stable formation control, real-time feasibility, and effective collision avoidance, validating its potential for large-scale deployment.
Adaptive Teaming in Multi-Drone Pursuit: Simulation, Training, and Deployment
Li, Yang, Chen, Junfan, Xue, Feng, Qiu, Jiabin, Li, Wenbin, Zhang, Qingrui, Wen, Ying, Pan, Wei
Adaptive teaming, the ability to collaborate with unseen teammates without prior coordination, remains an underexplored challenge in multi-robot collaboration. This paper focuses on adaptive teaming in multi-drone cooperative pursuit, a critical task with real-world applications such as border surveillance, search-and-rescue, and counter-terrorism. We first define and formalize the \textbf{A}daptive Teaming in \textbf{M}ulti-\textbf{D}rone \textbf{P}ursuit (AT-MDP) problem and introduce AT-MDP framework, a comprehensive framework that integrates simulation, algorithm training and real-world deployment. AT-MDP framework provides a flexible experiment configurator and interface for simulation, a distributed training framework with an extensive algorithm zoo (including two newly proposed baseline methods) and an unseen drone zoo for evaluating adaptive teaming, as well as a real-world deployment system that utilizes edge computing and Crazyflie drones. To the best of our knowledge, AT-MDP framework is the first adaptive framework for continuous-action decision-making in complex real-world drone tasks, enabling multiple drones to coordinate effectively with unseen teammates. Extensive experiments in four multi-drone pursuit environments of increasing difficulty confirm the effectiveness of AT-MDP framework, while real-world deployments further validate its feasibility in physical systems. Videos and code are available at https://sites.google.com/view/at-mdp.
SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots
Zhang, Chuhan, Wang, Cong, Pan, Wei, Della Santina, Cosimo
Inspired by the dynamic coupling of moto-neurons and physical elasticity in animals, this work explores the possibility of generating locomotion gaits by utilizing physical oscillations in a soft snake by means of a low-level spiking neural mechanism. To achieve this goal, we introduce the Double Threshold Spiking neuron model with adjustable thresholds to generate varied output patterns. This neuron model can excite the natural dynamics of soft robotic snakes, and it enables distinct movements, such as turning or moving forward, by simply altering the neural thresholds. Finally, we demonstrate that our approach, termed SpikingSoft, naturally pairs and integrates with reinforcement learning. The high-level agent only needs to adjust the two thresholds to generate complex movement patterns, thus strongly simplifying the learning of reactive locomotion. Simulation results demonstrate that the proposed architecture significantly enhances the performance of the soft snake robot, enabling it to achieve target objectives with a 21.6% increase in success rate, a 29% reduction in time to reach the target, and smoother movements compared to the vanilla reinforcement learning controllers or Central Pattern Generator controller acting in torque space.
Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC
Zhang, Xinglong, Pan, Wei, Li, Cong, Xu, Xin, Wang, Xiangke, Zhang, Ronghua, Hu, Dewen
Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.
LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models
Abdi, Hossein, Sun, Mingfei, Zhang, Andi, Kaski, Samuel, Pan, Wei
Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.
Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces
Yadav, Rishabh Dev, Dantu, Swati, Pan, Wei, Sun, Sihao, Roy, Spandan, Baldi, Simone
--Successful aerial manipulation largely depends on how effectively a controller can tackle the coupling dynamic forces between the aerial vehicle and the manipulator . However, this control problem has remained largely unsolved as the existing control approaches either require precise knowledge of the aerial vehicle/manipulator inertial couplings, or neglect the state-dependent uncertainties especially arising during the interaction phase. This work proposes an adaptive control solution to overcome this long standing control challenge without any a priori knowledge of the coupling dynamic terms. Additionally, in contrast to the existing adaptive control solutions, the proposed control framework is modular, that is, it allows independent tuning of the adaptive gains for the vehicle position sub-dynamics, the vehicle attitude sub-dynamics, and the manipulator sub-dynamics. Stability of the closed loop under the proposed scheme is derived analytically, and real-time experiments validate the effectiveness of the proposed scheme over the state-of-the-art approaches. I. INTRODUCTION An Unmanned Aerial Manipulator (UAM) is a coupled system where a quadrotor (or multirotor) vehicle carries a manipulator: the presence of the manipulator greatly improves the dexterity and flexibility of the quadrotor, making it capable to accomplish a wide range of tasks, from simple payload transportation to more complex tasks such as pick and place, contact-based inspection, grasping and assembling etc. [1]-[8]. This work was supported in part by "Aerial Manipulation" under IHFC grand project (GP/2021/DA/032), in part by "Capacity building for human resource development in Unmanned Aircraft System (Drone and related Technology)", MeiTY, India, in part by the Natural Science Foundation of China grants 62233004 and 62073074, and in part by Jiangsu Provincial Scientific Research Center of Applied Mathematics grant BK20233002.
Spatial-aware decision-making with ring attractors in reinforcement learning systems
Saura, Marcos Negre, Allmendinger, Richard, Papamarkou, Theodore, Pan, Wei
This paper explores the integration of ring attractors, a mathematical model inspired by neural circuit dynamics, into the reinforcement learning (RL) action selection process. Ring attractors, as specialized brain-inspired structures that encode spatial information and uncertainty, offer a biologically plausible mechanism to improve learning speed and predictive performance. They do so by explicitly encoding the action space, facilitating the organization of neural activity, and enabling the distribution of spatial representations across the neural network in the context of deep RL. The application of ring attractors in the RL action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building them as exogenous models and integrating them as part of a Deep Learning policy algorithm. Our results show a significant improvement in state-of-the-art models for the Atari 100k benchmark. Notably, our integrated approach improves the performance of state-of-the-art models by half, representing a 53% increase over selected baselines. This paper addresses the challenge of efficient action selection in reinforcement learning (RL), particularly in environments with spatial structures. Our primary contribution is the novel integration of ring attractors (Kim et al., 2017), a neural circuit model from neuroscience, into the RL framework. This approach improves spatial awareness in action selection and provides a mechanism for uncertainty-aware decision making in RL, leading to more accurate and efficient learning in complex environments.
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
Liu, Tenglong, Li, Yang, Lan, Yixing, Gao, Hao, Pan, Wei, Xu, Xin
In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that generates the offline dataset as constraints. The problem becomes particularly noticeable when the quality of the dataset is suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage actions from an augmented behavior policy combined with VAE to guide the learned policy. A2PR can select high-advantage actions that differ from those present in the dataset, while still effectively maintaining conservatism from OOD actions. This is achieved by harnessing the VAE capacity to generate samples matching the distribution of the data points. We theoretically prove that the improvement of the behavior policy is guaranteed. Besides, it effectively mitigates value overestimation with a bounded performance gap. Empirically, we conduct a series of experiments on the D4RL benchmark, where A2PR demonstrates state-of-the-art performance. Furthermore, experimental results on additional suboptimal mixed datasets reveal that A2PR exhibits superior performance. Code is available at https://github.com/ltlhuuu/A2PR.