AITopics | Xue, Zhengrong

Collaborating Authors

Xue, Zhengrong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Xue, Zhengrong, Deng, Shuying, Chen, Zhenyang, Wang, Yixuan, Yuan, Zhecheng, Xu, Huazhe

arXiv.org Artificial IntelligenceFeb-24-2025

Visuomotor policies have shown great promise in robotic manipulation but often require substantial amounts of human-collected data for effective performance. A key reason underlying the data demands is their limited spatial generalization capability, which necessitates extensive data collection across different object configurations. In this work, we present DemoGen, a low-cost, fully synthetic approach for automatic demonstration generation. Using only one human-collected demonstration per task, DemoGen generates spatially augmented demonstrations by adapting the demonstrated action trajectory to novel object configurations. Visual observations are synthesized by leveraging 3D point clouds as the modality and rearranging the subjects in the scene via 3D editing. Empirically, DemoGen significantly enhances policy performance across a diverse range of real-world manipulation tasks, showing its applicability even in challenging scenarios involving deformable objects, dexterous hand end-effectors, and bimanual platforms. Furthermore, DemoGen can be extended to enable additional out-of-distribution capabilities, including disturbance resistance and obstacle avoidance.

demonstration, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.16932

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning

Huang, Suning, Zhang, Zheyu, Liang, Tianhai, Xu, Yihan, Kou, Zhehao, Lu, Chenhao, Xu, Guowei, Xue, Zhengrong, Xu, Huazhe

arXiv.org Artificial IntelligenceOct-19-2024

Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone, enhancing the agent's ability to handle complex tasks by leveraging modular expert learning to avoid gradient conflicts. Furthermore, MENTOR introduces a task-oriented perturbation mechanism, which heuristically samples perturbation candidates containing task-relevant information, leading to more targeted and effective optimization. MENTOR outperforms stateof-the-art methods across three simulation domains--DeepMind Control Suite, Meta-World, and Adroit. Additionally, MENTOR achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks including Peg Insertion, Cable Routing, and Tabletop Golf, which significantly surpasses the success rate of 32% from the current strongest model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at mentor. Figure 1: MENTOR is validated in real-world tasks. We design three challenging robotic learning tasks for the agent to acquire skills through real-world visual reinforcement learning. MENTOR achieves the most efficient and robust policies compared to the baselines. Despite substantial progress in this field (Kostrikov et al., 2020; Yarats et al., 2021; Schwarzer et al., 2020; Stooke et al., 2021; Laskin et al., 2020a), these methods still suffer from low sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2410.14972

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind

Ding, Wei, Li, Fanhong, Ji, Ziteng, Xue, Zhengrong, Liu, Jia

arXiv.org Artificial IntelligenceJun-15-2024

We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human well-being. When around humans, AToM-Bot first detects current human needs based on inferred human states and observations of the surrounding environment. It then generates tasks to fulfill these needs, taking into account its embodied constraints. We designed 16 daily life scenarios spanning 4 common scenes and tasked the same visual stimulus to 59 human subjects and our robot. We used the similarity between human open-ended answers and robot output, and the human satisfaction scores to metric robot performance. AToM-Bot received high human evaluations in need detection (6.42/7, 91.7%), embodied solution (6.15/7, 87.8%) and task execution (6.17/7, 88.1%). We show that AToM-Bot excels in generating and executing feasible plans to fulfill unspoken human needs. Videos and code are available at https://affective-tom-bot.github.io.

large language model, natural language, navigation, (19 more...)

arXiv.org Artificial Intelligence

2406.08455

Country: North America > United States > California (0.28)

Genre: Research Report (0.51)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation

Gao, Chongkai, Xue, Zhengrong, Deng, Shuying, Liang, Tianhai, Yang, Siqi, Shao, Lin, Xu, Huazhe

arXiv.org Artificial IntelligenceMar-28-2024

We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable action space of RiEMann facilitates the addition of custom equivariant actions such as the direction of turning the faucet, which makes articulated object manipulation possible for RiEMann. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors on predicted poses (reduced by 68.6%), and achieves a 5.4 frames per second (FPS) network inference speed. Code and video results are available at https://riemann-web.github.io/.

artificial intelligence, machine learning, vector field, (14 more...)

arXiv.org Artificial Intelligence

2403.1946

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.81)

Add feedback

ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Xue, Zhengrong, Zhang, Han, Cheng, Jingwen, He, Zhengmao, Ju, Yuanchen, Lin, Changyi, Zhang, Gu, Xu, Huazhe

arXiv.org Artificial IntelligenceJun-29-2023

The notion of robotic manipulation [1, 2] easily invokes the image of a biomimetic robot arm or hand trying to grasp tabletop objects and then rearrange them into desired configurations inferred by exteroceptive sensors such as RGBD cameras. To facilitate this manipulation pipeline, the robot learning community has made tremendous efforts in either how to determine steadier grasping poses in demanding scenarios [3, 4, 5, 6, 7] or how to understand the exteroceptive inputs in a more robust and generalizable way [8, 9, 10, 11, 12, 13]. Acknowledging these progresses, this paper attempts to bypass the challenges in the prevailing pipeline by advocating ArrayBot, a reinforcement learning driven system for distributed manipulation [14], where the objects are manipulated through a great number of actuators with only proprioceptive tactile sensing [15, 16, 17, 18]. Conceptually, the hardware of ArrayBot is a 16 16 array of vertically sliding pillars, each of which can be independently actuated, leading to a 16 16 action space. Functionally, the actuators beneath a tabletop object can support its weight and at the same time cooperate to lift, tilt, and even translate it through proper motion policies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2306.16857

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Materials (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.88)

Add feedback

USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable Manipulation

Xue, Zhengrong, Yuan, Zhecheng, Wang, Jiashun, Wang, Xueqian, Gao, Yang, Xu, Huazhe

arXiv.org Artificial IntelligenceFeb-17-2023

Can a robot manipulate intra-category unseen objects in arbitrary poses with the help of a mere demonstration of grasping pose on a single object instance? In this paper, we try to address this intriguing challenge by using USEEK, an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category, to perform generalizable manipulation. USEEK follows a teacher-student structure to decouple the unsupervised keypoint discovery and SE(3)-equivariant keypoint detection. With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner, enabling manipulation of any intra-category objects from and to any poses. Through extensive experiments, we demonstrate that the keypoints produced by USEEK possess rich semantics, thus successfully transferring the functional knowledge from the demonstration object to the novel ones. Compared with other object representations for manipulation, USEEK is more adaptive in the face of large intra-category shape variance, more robust with limited demonstrations, and more efficient at inference time.

artificial intelligence, keypoint, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2209.13864

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

Add feedback

Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Yuan, Zhecheng, Xue, Zhengrong, Yuan, Bo, Wang, Xueqian, Wu, Yi, Gao, Yang, Xu, Huazhe

arXiv.org Artificial IntelligenceDec-17-2022

Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World, and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting. Project Page: https://sites.google.com/view/pie-g/home.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2212.0886

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

BiasedWalk: Learning Global-aware Node Embeddings via Biased Sampling

Xue, Zhengrong, Guo, Ziao, Guo, Yiwei

arXiv.org Artificial IntelligenceJan-22-2022

Popular node embedding methods such as DeepWalk follow the paradigm of performing random walks on the graph, and then requiring each node to be proximate to those appearing along with it. Though proved to be successful in various tasks, this paradigm reduces a graph with topology to a set of sequential sentences, thus omitting global information. To produce global-aware node embeddings, we propose BiasedWalk, a biased random walk strategy that favors nodes with similar semantics. Empirical evidence suggests BiasedWalk can generally enhance global awareness of the generated embeddings.

artificial intelligence, machine learning, node, (15 more...)

arXiv.org Artificial Intelligence

2201.09882

Country: Asia > China (0.16)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback