Peng, Ying
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Zhang, Hongyin, Ding, Pengxiang, Lyu, Shangke, Peng, Ying, Wang, Donglin
With the rapid development of embodied artificial intelligence, significant progress has been made in vision-language-action (VLA) models for general robot decision-making. However, the majority of existing VLAs fail to account for the inevitable external perturbations encountered during deployment. These perturbations introduce unforeseen state information to the VLA, resulting in inaccurate actions and consequently, a significant decline in generalization performance. The classic internal model control (IMC) principle demonstrates that a closed-loop system with an internal model that includes external input signals can accurately track the reference input and effectively offset the disturbance. We propose a novel closed-loop VLA method GEVRM that integrates the IMC principle to enhance the robustness of robot visual manipulation. The text-guided video generation model in GEVRM can generate highly expressive future visual planning goals. Simultaneously, we evaluate perturbations by simulating responses, which are called internal embeddings and optimized through prototype contrastive learning. This allows the model to implicitly infer and distinguish perturbations from the external environment. The proposed GEVRM achieves state-of-the-art performance on both standard and perturbed CALVIN benchmarks and shows significant improvements in realistic robot tasks.
A novel control method for solving high-dimensional Hamiltonian systems through deep neural networks
Ji, Shaolin, Peng, Shige, Peng, Ying, Zhang, Xichuan
In this paper, we mainly focus on solving high-dimensional stochastic Hamiltonian systems with boundary condition, which is essentially a Forward Backward Stochastic Differential Equation (FBSDE in short), and propose a novel method from the view of the stochastic control. In order to obtain the approximated solution of the Hamiltonian system, we first introduce a corresponding stochastic optimal control problem such that the extended Hamiltonian system of the control problem is exactly what we need to solve, then we develop two different algorithms suitable for different cases of the control problem and approximate the stochastic control via deep neural networks. From the numerical results, comparing with the Deep FBSDE method developed previously from the view of solving FBSDEs, the novel algorithms converge faster, which means that they require fewer training steps, and demonstrate more stable convergences for different Hamiltonian systems.
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Dong, Yihong, Peng, Ying, Yang, Muqiao, Lu, Songtao, Shi, Qingjiang
Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals. However, this power of most deep learning techniques heavily relies on an abundant amount of training data, so the performance of classic neural nets decreases sharply when the number of training data samples is small or unseen data are presented in the testing phase. This calls for an advanced strategy, i.e., model-agnostic meta-learning (MAML), which is able to capture the invariant representation of the data samples or signals. In this paper, inspired by the special structure of the signal, i.e., real and imaginary parts consisted in practical time-series signals, we propose a Complex-valued Attentional MEta Learner (CAMEL) for the problem of few-shot signal recognition by leveraging attention and meta-learning in the complex domain. To the best of our knowledge, this is also the first complex-valued MAML that can find the first-order stationary points of general nonconvex problems with theoretical convergence guarantees. Extensive experiments results showcase the superiority of the proposed CAMEL compared with the state-of-the-art methods.