Not enough data to create a plot.
Try a different view from the menu above.
Wang, Xueqian
Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics
Deng, Yuhong, Mo, Kai, Xia, Chongkun, Wang, Xueqian
Multi-task learning of deformable object manipulation is a challenging problem in robot manipulation. Most previous works address this problem in a goal-conditioned way and adapt goal images to specify different tasks, which limits the multi-task learning performance and can not generalize to new tasks. Thus, we adapt language instruction to specify deformable object manipulation tasks and propose a learning framework. We first design a unified Transformer-based architecture to understand multi-modal data and output picking and placing action. Besides, we have introduced the visible connectivity graph to tackle nonlinear dynamics and complex configuration of the deformable object. Both simulated and real experiments have demonstrated that the proposed method is effective and can generalize to unseen instructions and tasks. Compared with the state-of-the-art method, our method achieves higher success rates (87.2% on average) and has a 75.6% shorter inference time. We also demonstrate that our method performs well in real-world experiments.
PPNet: A Novel Neural Network Structure for End-to-End Near-Optimal Path Planning
Meng, Qinglong, Xia, Chongkun, Wang, Xueqian, Mai, Songping, Liang, Bin
The classical path planners, such as sampling-based path planners, have the limitations of sensitivity to the initial solution and slow convergence to the optimal solution. However, finding a near-optimal solution in a short period is challenging in many applications such as the autonomous vehicle with limited power/fuel. To achieve an end-to-end near-optimal path planner, we first divide the path planning problem into two subproblems, which are path's space segmentation and waypoints generation in the given path's space. We further propose a two-level cascade neural network named Path Planning Network (PPNet) to solve the path planning problem by solving the abovementioned subproblems. Moreover, we propose a novel efficient data generation method for path planning named EDaGe-PP. The results show the total computation time is less than 1/33 and the success rate of PPNet trained by the dataset that is generated by EDaGe-PP is about $2 \times$ compared to other methods. We validate PPNet against state-of-the-art path planning methods. The results show PPNet can find a near-optimal solution in 15.3ms, which is much shorter than the state-of-the-art path planners.
Polynomial-time Approximation Scheme for Equilibriums of Games
Sun, Hongbo, Xia, Chongkun, Yuan, Bo, Wang, Xueqian, Liang, Bin
Nash equilibrium[1] of normal-form game was proposed decades ago, yet even whether PTAS exists for it remains undecided, not to mention for equilibriums of games with dynamics. PTAS for equilibriums of games is important itself in game theory, and the confirmation of its existence may impact multi-agent reinforcement learning research. First, the existence of PTAS relates to the practicality of the amount of computational power in achieving equilibriums of large scale games. It has been proved that exactly computing a Nash equilibrium of a static game is in PPAD-hard class of complexity[2]. Ignoring the possibility that PPAD itself is of polynomial-time[3], PTAS describes methods that approximately compute Nash equilibriums efficiently. Second, the confirmation of previously unknown existence of PTAS for games implies possibility to fundamentally solve the problems of non-stationarity in training and curse of dimensionality[4] in multi-agent reinforcement learning at the same time. Both the two problems are related to the absence of PTAS for equilibriums of games. Non-stationarity in training relates to the fact that existing polynomial-time methods lack convergence guarantee to equilibriums, and curse of dimensionality relates to the fact that methods with convergence guarantee lack polynomial-time complexity.
Path Generation for Wheeled Robots Autonomous Navigation on Vegetated Terrain
Jian, Zhuozhu, Liu, Zejia, Shao, Haoyu, Wang, Xueqian, Chen, Xinlei, Liang, Bin
Wheeled robot navigation has been widely used in urban environments, but little research has been conducted on its navigation in wild vegetation. External sensors (LiDAR, camera etc.) are often used to construct point cloud map of the surrounding environment, however, the supporting rigid ground used for travelling cannot be detected due to the occlusion of vegetation. This often causes unsafe or not smooth path during planning process. To address the drawback, we propose the PE-RRT* algorithm, which effectively combines a novel support plane estimation method and sampling algorithm to generate real-time feasible and safe path in vegetation environments. In order to accurately estimate the support plane, we combine external perception and proprioception, and use Multivariate Gaussian Processe Regression (MV-GPR) to estimate the terrain at the sampling nodes. We build a physical experimental platform and conduct experiments in different outdoor environments. Experimental results show that our method has high safety, robustness and generalization.
Hybrid Trajectory Optimization for Autonomous Terrain Traversal of Articulated Tracked Robots
Xu, Zhengzhe, Chen, Yanbo, Jian, Zhuozhu, Tan, Junbo, Wang, Xueqian, Liang, Bin
Autonomous terrain traversal of articulated tracked robots can reduce operator cognitive load to enhance task efficiency and facilitate extensive deployment. We present a novel hybrid trajectory optimization method aimed at generating efficient, stable, and smooth traversal motions. To achieve this, we develop a planar robot-terrain contact model and divide the robot's motion into hybrid modes of driving and traversing. By using a generalized coordinate description, the configuration space dimension is reduced, which facilitates real-time planning. The hybrid trajectory optimization is transcribed into a nonlinear programming problem and divided into subproblems to be solved in a receding-horizon planning fashion. Mode switching is facilitated by associating optimized motion durations with a predefined traversal sequence. A multi-objective cost function is formulated to further improve the traversal performance. Additionally, map sampling, terrain simplification, and tracking controller modules are integrated into the autonomous terrain traversal system. Our approach is validated in simulation and real-world scenarios with the Searcher robotic platform. Comparative experiments with expert operator control and state-of-the-art methods show advantages in terms of time and energy efficiency, stability, and smoothness of motion.
Replay-enhanced Continual Reinforcement Learning
Zhang, Tiantian, Shen, Kevin Zehua, Lin, Zichuan, Yuan, Bo, Wang, Xueqian, Li, Xiu, Ye, Deheng
Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods.
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
Ma, Guozheng, Zhang, Linrui, Wang, Haoyu, Li, Lu, Wang, Zilin, Wang, Zhen, Shen, Li, Wang, Xueqian, Tao, Dacheng
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.
Learning visual-based deformable object rearrangement with local graph neural networks
Deng, Yuhong, Wang, Xueqian, chen, Lipeng
Goal-conditioned rearrangement of deformable objects (e.g. straightening a rope and folding a cloth) is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a prescribed goal configuration with only visual observations. These tasks are typically confronted with two main challenges: the high dimensionality of deformable configuration space and the underlying complexity, nonlinearity and uncertainty inherent in deformable dynamics. To address these challenges, we propose a novel representation strategy that can efficiently model the deformable object states with a set of keypoints and their interactions. We further propose local-graph neural network (GNN), a light local GNN learning to jointly model the deformable rearrangement dynamics and infer the optimal manipulation actions (e.g. pick and place) by constructing and updating two dynamic graphs. Both simulated and real experiments have been conducted to demonstrate that the proposed dynamic graph representation shows superior expressiveness in modeling deformable rearrangement dynamics. Our method reaches much higher success rates on a variety of deformable rearrangement tasks (96.3% on average) than state-of-the-art method in simulation experiments. Besides, our method is much more lighter and has a 60% shorter inference time than state-of-the-art methods. We also demonstrate that our method performs well in the multi-task learning scenario and can be transferred to real-world applications with an average success rate of 95% by solely fine tuning a keypoint detector.
The Road to On-board Change Detection: A Lightweight Patch-Level Change Detection Network via Exploring the Potential of Pruning and Pooling
Xue, Lihui, Wang, Zhihao, Wang, Xueqian, Li, Gang
Existing satellite remote sensing change detection (CD) methods often crop original large-scale bi-temporal image pairs into small patch pairs and then use pixel-level CD methods to fairly process all the patch pairs. However, due to the sparsity of change in large-scale satellite remote sensing images, existing pixel-level CD methods suffer from a waste of computational cost and memory resources on lots of unchanged areas, which reduces the processing efficiency of on-board platform with extremely limited computation and memory resources. To address this issue, we propose a lightweight patch-level CD network (LPCDNet) to rapidly remove lots of unchanged patch pairs in large-scale bi-temporal image pairs. This is helpful to accelerate the subsequent pixel-level CD processing stage and reduce its memory costs. In our LPCDNet, a sensitivity-guided channel pruning method is proposed to remove unimportant channels and construct the lightweight backbone network on basis of ResNet18 network. Then, the multi-layer feature compression (MLFC) module is designed to compress and fuse the multi-level feature information of bi-temporal image patch. The output of MLFC module is fed into the fully-connected decision network to generate the predicted binary label. Finally, a weighted cross-entropy loss is utilized in the training process of network to tackle the change/unchange class imbalance problem. Experiments on two CD datasets demonstrate that our LPCDNet achieves more than 1000 frames per second on an edge computation platform, i.e., NVIDIA Jetson AGX Orin, which is more than 3 times that of the existing methods without noticeable CD performance loss. In addition, our method reduces more than 60% memory costs of the subsequent pixel-level CD processing stage.
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
Ma, Guozheng, Li, Lu, Zhang, Sen, Liu, Zixuan, Wang, Zhen, Chen, Yixin, Shen, Li, Wang, Xueqian, Tao, Dacheng
Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.