AITopics | Lu, Chris Xiaoxuan

Collaborating Authors

Lu, Chris Xiaoxuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning

Tsagkas, Nikolaos, Sochopoulos, Andreas, Danier, Duolikun, Lu, Chris Xiaoxuan, Mac Aodha, Oisin

arXiv.org Artificial IntelligenceFeb-5-2025

The integration of pre-trained visual representations (PVRs) into visuo-motor robot learning has emerged as a promising alternative to training visual encoders from scratch. However, PVRs face critical challenges in the context of policy learning, including temporal entanglement and an inability to generalise even in the presence of minor scene perturbations. These limitations hinder performance in tasks requiring temporal awareness and robustness to scene changes. This work identifies these shortcomings and proposes solutions to address them. First, we augment PVR features with temporal perception and a sense of task completion, effectively disentangling them in time. Second, we introduce a module that learns to selectively attend to task-relevant local features, enhancing robustness when evaluated on out-of-distribution scenes. Our experiments demonstrate significant performance improvements, particularly in PVRs trained with masking objectives, and validate the effectiveness of our enhancements in addressing PVR-specific limitations.

artificial intelligence, conference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.0327

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

Ding, Fangqiang, Wen, Xiangyu, Zhu, Lawrence, Li, Yiming, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceJun-13-2024

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

artificial intelligence, occupancy prediction, proceedings, (11 more...)

arXiv.org Artificial Intelligence

2405.14014

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Industry:

Automobiles & Trucks (0.87)
Transportation > Ground > Road (0.69)
Information Technology > Robotics & Automation (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors

Tsagkas, Nikolaos, Rome, Jack, Ramamoorthy, Subramanian, Mac Aodha, Oisin, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceMar-21-2024

Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web-trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario validate the efficacy of our approach, demonstrating its potential for advancing semantic-aware robotics manipulation. Web page: https://tsagkas.github.io/click2grasp

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2403.14526

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

Ding, Fangqiang, Luo, Zhen, Zhao, Peijun, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceJan-12-2024

Approaching the era of ubiquitous computing, human motion sensing plays a crucial role in smart systems for decision making, user interaction, and personalized services. Extensive research has been conducted on human tracking, pose estimation, gesture recognition, and activity recognition, which are predominantly based on cameras in traditional methods. However, the intrusive nature of cameras limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose milliFlow, a novel deep learning method for scene flow estimation as a complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method with an average 3D endpoint error of 4.6cm, significantly surpassing the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition, human parsing, and human body part tracking. To foster further research in this area, we will provide our codebase and dataset for open access upon acceptance.

artificial intelligence, machine learning, point cloud, (10 more...)

arXiv.org Artificial Intelligence

2306.1701

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Information Technology > Smart Houses & Appliances (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud

Pan, Zhijun, Ding, Fangqiang, Zhong, Hantao, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceJan-10-2024

Mobile autonomy relies on the precise perception of dynamic environments. Robustly tracking moving objects in 3D world thus plays a pivotal role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods utilize LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art.

artificial intelligence, machine learning, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2309.09737

Country:

Europe > Netherlands > South Holland > Delft (0.25)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

See Beyond Seeing: Robust 3D Object Detection from Point Clouds via Cross-Modal Hallucination

Deng, Jianning, Chan, Gabriel, Zhong, Hantao, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceSep-29-2023

This paper presents a novel framework for robust 3D object detection from point clouds via cross-modal hallucination. Our proposed approach is agnostic to either hallucination direction between LiDAR and 4D radar. We introduce multiple alignments on both spatial and feature levels to achieve simultaneous backbone refinement and hallucination generation. Specifically, spatial alignment is proposed to deal with the geometry discrepancy for better instance matching between LiDAR and radar. The feature alignment step further bridges the intrinsic attribute gap between the sensing modalities and stabilizes the training. The trained object detection models can deal with difficult detection cases better, even though only single-modal data is used as the input during the inference stage. Extensive experiments on the View-of-Delft (VoD) dataset show that our proposed method outperforms the state-of-the-art (SOTA) methods for both radar and LiDAR object detection while maintaining competitive efficiency in runtime.

cross-modal hallucination, object detection, point cloud

arXiv.org Artificial Intelligence

2309.17336

Country: Europe > Netherlands > South Holland > Delft (0.24)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

Yang, Jianfei, Huang, He, Zhou, Yunjiao, Chen, Xinyan, Xu, Yuecong, Yuan, Shenghai, Zou, Han, Lu, Chris Xiaoxuan, Xie, Lihua

arXiv.org Artificial IntelligenceSep-24-2023

However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for device-free human sensing. In this paper, we propose MM-Fi, the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories, to bridge the gap between wireless sensing and high-level human perception tasks. MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects. Various annotations are provided to support potential sensing tasks, e.g., human pose estimation and action recognition. Extensive experiments have been conducted to compare the sensing capacity of each or several modalities in terms of multiple tasks. We envision that MM-Fi can contribute to wireless sensing research with respect to action recognition, human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare research.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.10345

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Consumer Health (0.46)
Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Robust Human Detection under Visual Degradation via Thermal and mmWave Radar Fusion

Cai, Kaiwen, Xia, Qiyue, Li, Peize, Stankovic, John, Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceJul-7-2023

The majority of human detection methods rely on the sensor using visible lights (e.g., RGB cameras) but such sensors are limited in scenarios with degraded vision conditions. In this paper, we present a multimodal human detection system that combines portable thermal cameras and single-chip mmWave radars. To mitigate the noisy detection features caused by the low contrast of thermal cameras and the multi-path noise of radar point clouds, we propose a Bayesian feature extractor and a novel uncertainty-guided fusion method that surpasses a variety of competing methods, either single-modal or multi-modal. We evaluate the proposed method on real-world data collection and demonstrate that our approach outperforms the state-of-the-art methods by a large margin.

artificial intelligence, detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.03623

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

Ding, Fangqiang, Palffy, Andras, Gavrila, Dariu M., Lu, Chris Xiaoxuan

arXiv.org Artificial IntelligenceMar-17-2023

This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. Our approach is motivated by the co-located sensing redundancy in modern autonomous vehicles. Such redundancy implicitly provides various forms of supervision cues to the radar scene flow estimation. Specifically, we introduce a multi-task model architecture for the identified cross-modal learning problem and propose loss functions to opportunistically engage scene flow estimation using multiple cross-modal constraints for effective model training. Extensive experiments show the state-of-the-art performance of our method and demonstrate the effectiveness of cross-modal supervised learning to infer more accurate 4D radar scene flow. We also show its usefulness to two subtasks - motion segmentation and ego-motion estimation. Our source code will be available on https://github.com/Toytiny/CMFlow.

artificial intelligence, inductive learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2303.00462

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
(2 more...)

Add feedback

Uncertainty Estimation for 3D Dense Prediction via Cross-Point Embeddings

Cai, Kaiwen, Lu, Chris Xiaoxuan, Huang, Xiaowei

arXiv.org Artificial IntelligenceFeb-24-2023

Dense prediction tasks are common for 3D point clouds, but the uncertainties inherent in massive points and their embeddings have long been ignored. In this work, we present CUE, a novel uncertainty estimation method for dense prediction tasks in 3D point clouds. Inspired by metric learning, the key idea of CUE is to explore cross-point embeddings upon a conventional 3D dense prediction pipeline. Specifically, CUE involves building a probabilistic embedding model and then enforcing metric alignments of massive points in the embedding space. We also propose CUE+, which enhances CUE by explicitly modeling crosspoint dependencies in the covariance matrix. We demonstrate that both CUE and CUE+ are generic and effective for uncertainty estimation in 3D point clouds with two different tasks: (1) in 3D geometric feature learning we for the first time obtain wellcalibrated uncertainty, and (2) in semantic segmentation we reduce uncertainty's Expected Calibration Error of the state-of-the-arts by 16.5%. All uncertainties are estimated without compromising predictive performance.

artificial intelligence, machine learning, point cloud, (13 more...)

arXiv.org Artificial Intelligence

2209.14602

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)

Add feedback