AITopics | Chu, Xiaomeng

Collaborating Authors

Chu, Xiaomeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions

Chu, Xiaomeng, Deng, Jiajun, You, Guoliang, Liu, Wei, Li, Xingchen, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceMar-20-2025

Flexible instruction-guided 6-DoF grasping is a significant yet challenging task for real-world robotic systems. Existing methods utilize the contextual understanding capabilities of the large language models (LLMs) to establish mappings between expressions and targets, allowing robots to comprehend users' intentions in the instructions. However, the LLM's knowledge about objects' physical properties remains underexplored despite its tight relevance to grasping. In this work, we propose GraspCoT, a 6-DoF grasp detection framework that integrates a Chain-of-Thought (CoT) reasoning mechanism oriented to physical properties, guided by auxiliary question-answering (QA) tasks. Particularly, we design a set of QA templates to enable hierarchical reasoning that includes three stages: target parsing, physical property analysis, and grasp action selection. Moreover, GraspCoT presents a unified multimodal LLM architecture, which encodes multi-view observations of 3D scenes into 3D-aware visual tokens, and then jointly embeds these visual tokens with CoT-derived textual tokens within LLMs to generate grasp pose predictions. Furthermore, we present IntentGrasp, a large-scale benchmark that fills the gap in public datasets for multi-object grasp detection under diverse and indirect verbal commands. Extensive experiments on IntentGrasp demonstrate the superiority of our method, with additional validation in real-world robotic applications confirming its practicality. Codes and data will be released.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.16013

Country:

Europe > Italy (0.14)
Asia > Middle East > UAE (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CELLmap: Enhancing LiDAR SLAM through Elastic and Lightweight Spherical Map Representation

Duan, Yifan, Zhang, Xinran, Li, Yao, You, Guoliang, Chu, Xiaomeng, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceSep-29-2024

SLAM is a fundamental capability of unmanned systems, with LiDAR-based SLAM gaining widespread adoption due to its high precision. Current SLAM systems can achieve centimeter-level accuracy within a short period. However, there are still several challenges when dealing with largescale mapping tasks including significant storage requirements and difficulty of reusing the constructed maps. To address this, we first design an elastic and lightweight map representation called CELLmap, composed of several CELLs, each representing the local map at the corresponding location. Then, we design a general backend including CELL-based bidirectional registration module and loop closure detection module to improve global map consistency. Our experiments have demonstrated that CELLmap can represent the precise geometric structure of large-scale maps of KITTI dataset using only about 60 MB. Additionally, our general backend achieves up to a 26.88% improvement over various LiDAR odometry methods.

artificial intelligence, machine learning, registration, (16 more...)

arXiv.org Artificial Intelligence

2409.19597

Country:

Asia > China (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

You, Guoliang, Chu, Xiaomeng, Duan, Yifan, Zhang, Wenyu, Li, Xingchen, Zhang, Sha, Li, Yao, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceJul-16-2024

When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.

artificial intelligence, intersection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.11644

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.90)

Add feedback

Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

Duan, Yifan, Zhang, Xinran, You, Guoliang, Wu, Yilong, Li, Xingchen, Li, Yao, Chu, Xiaomeng, Peng, Jie, Zhang, Yu, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceMay-9-2024

Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-to-fine framework, we first design a spherical descriptor TERRA for 3-DoF rotation initialization with no prior knowledge. To further optimize, we present JEEP for the joint estimation of extrinsic and pose, integrating geometric and motion information to overcome factors affecting the point cloud registration. Finally, the LiDAR poses optimized by the hierarchical optimization module are input to time synchronization module to produce the ultimate calibration results, including the time offset. To verify the effectiveness, we conduct extensive experiments on eight datasets, where 16 diverse types of LiDARs in total and dozens of calibration tasks are tested. In the challenging tasks, the calibration errors can still be controlled within 5cm and 1{\deg} with a high success rate.

artificial intelligence, lidar, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.05589

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.49)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

You, Guoliang, Chu, Xiaomeng, Duan, Yifan, Peng, Jie, Ji, Jianmin, Zhang, Yu, Zhang, Yanyong

arXiv.org Artificial IntelligenceMar-27-2023

It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2303.12371

Country: Asia > China > Anhui Province (0.15)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.47)
Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Zhai, Yi, Zhang, Yu, Liu, Shuo, Chu, Xiaomeng, Peng, Jie, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceNov-22-2022

Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives. We treat schedule primitives as tensor languages. TLP is thus a Tensor Language Processing task. In this way, the task of predicting the tensor program latency through the cost model is transformed into a natural language processing (NLP) regression task. MTL-TLP combines Multi-Task Learning and TLP to cope with the cross-hardware unavailability problem. We incorporate these techniques into the Ansor framework and conduct detailed experiments. Results show that TLP can speed up the average search time by 9.1X and 3.0X on CPU and GPU workloads, respectively, compared to the state-of-the-art implementation. MTL-TLP can achieve a speed-up of 4.7X and 2.9X on CPU and GPU workloads, respectively, using only 7% of the target hardware data.

machine learning, natural language, tensor program, (15 more...)

arXiv.org Artificial Intelligence

2211.03578

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback