AITopics | Teng, Xiao

Collaborating Authors

Teng, Xiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Wan, Zhaoliang, Ling, Yonggen, Yi, Senlin, Qi, Lu, Lee, Wangwei, Lu, Minglei, Yang, Sicheng, Teng, Xiao, Lu, Peng, Yang, Xu, Yang, Ming-Hsuan, Cheng, Hui

arXiv.org Artificial IntelligenceJan-6-2025

This paper addresses the scarcity of large-scale datasets for accurate object-in-hand pose estimation, which is crucial for robotic in-hand manipulation within the ``Perception-Planning-Control" paradigm. Specifically, we introduce VinT-6D, the first extensive multi-modal dataset integrating vision, touch, and proprioception, to enhance robotic manipulation. VinT-6D comprises 2 million VinT-Sim and 0.1 million VinT-Real splits, collected via simulations in MuJoCo and Blender and a custom-designed real-world platform. This dataset is tailored for robotic hands, offering models with whole-hand tactile perception and high-quality, well-aligned data. To the best of our knowledge, the VinT-Real is the largest considering the collection difficulties in the real-world environment so that it can bridge the gap of simulation to real compared to the previous works. Built upon VinT-6D, we present a benchmark method that shows significant improvements in performance by fusing multi-modal information. The project is available at https://VinT-6D.github.io/.

artificial intelligence, large-scale object-in-hand dataset, sensor, (14 more...)

arXiv.org Artificial Intelligence

2501.0051

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors

Teng, Xiao, Lan, Long, Chen, Dingyao, Xu, Kele, Yin, Nan

arXiv.org Artificial IntelligenceDec-15-2024

Unsupervised visible-infrared person re-identification (USL-VI-ReID) is of great research and practical significance yet remains challenging due to the absence of annotations. Existing approaches aim to learn modality-invariant representations in an unsupervised setting. However, these methods often encounter label noise within and across modalities due to suboptimal clustering results and considerable modality discrepancies, which impedes effective training. To address these challenges, we propose a straightforward yet effective solution for USL-VI-ReID by mitigating universal label noise using neighbor information. Specifically, we introduce the Neighbor-guided Universal Label Calibration (N-ULC) module, which replaces explicit hard pseudo labels in both homogeneous and heterogeneous spaces with soft labels derived from neighboring samples to reduce label noise. Additionally, we present the Neighbor-guided Dynamic Weighting (N-DW) module to enhance training stability by minimizing the influence of unreliable samples. Extensive experiments on the RegDB and SYSU-MM01 datasets demonstrate that our method outperforms existing USL-VI-Figure 1: Illustration of the motivation of our method.

artificial intelligence, machine learning, modality, (15 more...)

arXiv.org Artificial Intelligence

2412.1222

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Lifelike Agility and Play in Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

Han, Lei, Zhu, Qingxu, Sheng, Jiapeng, Zhang, Chong, Li, Tingguang, Zhang, Yizheng, Zhang, He, Liu, Yuzhen, Zhou, Cheng, Zhao, Rui, Li, Jie, Zhang, Yufeng, Wang, Rui, Chi, Wanchao, Li, Xiong, Zhu, Yonghui, Xiang, Lingzhu, Teng, Xiao, Zhang, Zhengyou

arXiv.org Artificial IntelligenceJul-6-2024

Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s42256-024-00861-3

2308.15143

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

Li, Yonglin, Zhang, Jing, Teng, Xiao, Lan, Long

arXiv.org Artificial IntelligenceOct-1-2023

The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps in an online manner. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-Modal MLP that projects the text embedding of the referring expression into sparse and dense embeddings, serving as user-interactive prompts. Additionally, we have introduced the hierarchical dense attention module to fuse hierarchical visual semantic information with sparse embeddings in order to obtain fine-grained dense embeddings, and an implicit tracking module to generate a track token and provide historical information for the mask decoder. Furthermore, we employ a parameter-efficient tuning strategy to effectively align and fuse the language and vision features. Through comprehensive ablation studies, we demonstrate the practical and effective design choices of our model. Extensive experiments conducted on Ref-Youtu-VOS, Ref-DAVIS17, and three referring image segmentation datasets validate the superiority and effectiveness of our RefSAM model over existing methods. The code and models will be made publicly at \href{https://github.com/LancasterLi/RefSAM}{github.com/LancasterLi/RefSAM}.

artificial intelligence, machine learning, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2307.00997

Genre: Research Report (1.00)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning to Purification for Unsupervised Person Re-identification

Lan, Long, Teng, Xiao, Zhang, Jing, Zhang, Xiang, Tao, Dacheng

arXiv.org Artificial IntelligenceJun-22-2022

Unsupervised person re-identification is a challenging and promising task in computer vision. Nowadays unsupervised person re-identification methods have achieved great progress by training with pseudo labels. However, how to purify feature and label noise is less explicitly studied in the unsupervised manner. To purify the feature, we take into account two types of additional features from different local views to enrich the feature representation. The proposed multi-view features are carefully integrated into our cluster contrast learning to leverage more discriminative cues that the global feature easily ignored and biased. To purify the label noise, we propose to take advantage of the knowledge of teacher model in an offline scheme. Specifically, we first train a teacher model from noisy pseudo labels, and then use the teacher model to guide the learning of our student model. In our setting, the student model could converge fast with the supervision of the teacher model thus reduce the interference of noisy labels as the teacher model greatly suffered. After carefully handling the noise and bias in the feature learning, our purification modules are proven to be very effective for unsupervised person re-identification. Extensive experiments on three popular person re-identification datasets demonstrate the superiority of our method. Especially, our approach achieves a state-of-the-art accuracy 85.8\% @mAP and 94.5\% @Rank-1 on the challenging Market-1501 benchmark with ResNet-50 under the fully unsupervised setting. The code will be released.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIP.2023.3278860

2204.09931

Genre: Research Report (1.00)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback