AITopics | Xu, Xiang

Collaborating Authors

Xu, Xiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining

Xu, Xiang, Kong, Lingdong, Shuai, Hui, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei, Liu, Qingshan

arXiv.org Artificial IntelligenceMar-25-2025

LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose SuperFlow++, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: (1) a view consistency alignment module to unify semantic information across camera views, (2) a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, (3) a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and (4) a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving. The code is publicly available at https://github.com/Xiangxu-0103/SuperFlow

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.19912

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.54)

Industry:

Transportation > Ground > Road (0.49)
Information Technology > Robotics & Automation (0.35)
Automobiles & Trucks (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

EventFly: Event Camera Perception from Ground to the Sky

Kong, Lingdong, Lu, Dongyue, Xu, Xiang, Ng, Lai Xing, Ooi, Wei Tsang, Cottereau, Benoit R.

arXiv.org Artificial IntelligenceMar-25-2025

Cross-platform adaptation in event-based dense perception is crucial for deploying event cameras across diverse settings, such as vehicles, drones, and quadrupeds, each with unique motion dynamics, viewpoints, and class distributions. In this work, we introduce EventFly, a framework for robust cross-platform adaptation in event camera perception. Our approach comprises three key components: i) Event Activation Prior (EAP), which identifies high-activation regions in the target domain to minimize prediction entropy, fostering confident, domain-adaptive predictions; ii) EventBlend, a data-mixing strategy that integrates source and target event voxel grids based on EAP-driven similarity and density maps, enhancing feature alignment; and iii) EventMatch, a dual-discriminator technique that aligns features from source, target, and blended domains for better domain-invariant learning. To holistically assess cross-platform adaptation abilities, we introduce EXPo, a large-scale benchmark with diverse samples across vehicle, drone, and quadruped platforms. Extensive experiments validate our effectiveness, demonstrating substantial gains over popular adaptation methods. We hope this work can pave the way for more adaptive, high-performing event perception across diverse and complex environments.

artificial intelligence, machine learning, platform, (14 more...)

arXiv.org Artificial Intelligence

2503.19916

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (0.92)
Transportation > Passenger (0.68)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Xu, Xiang, Kong, Lingdong, Shuai, Hui, Pan, Liang, Liu, Ziwei, Liu, Qingshan

arXiv.org Artificial IntelligenceJan-7-2025

LiDAR data pretraining offers a promising approach to leveraging large-scale, readily available datasets for enhanced data utilization. However, existing methods predominantly focus on sparse voxel representation, overlooking the complementary attributes provided by other LiDAR representations. In this work, we propose LiMoE, a framework that integrates the Mixture of Experts (MoE) paradigm into LiDAR data representation learning to synergistically combine multiple representations, such as range images, sparse voxels, and raw points. Our approach consists of three stages: i) Image-to-LiDAR Pretraining, which transfers prior knowledge from images to point clouds across different representations; ii) Contrastive Mixture Learning (CML), which uses MoE to adaptively activate relevant attributes from each representation and distills these mixed features into a unified 3D network; iii) Semantic Mixture Supervision (SMS), which combines semantic logits from multiple representations to boost downstream segmentation performance. Extensive experiments across 11 large-scale LiDAR datasets demonstrate our effectiveness and superiority. The code and model checkpoints have been made publicly accessible.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.04004

Country:

North America > United States (0.46)
Asia (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Kong, Lingdong, Xu, Xiang, Liu, Youquan, Cen, Jun, Chen, Runnan, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei

arXiv.org Artificial IntelligenceJan-7-2025

Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: i) VFM-driven superpixel generation for detailed semantic representation, ii) a VFM-assisted contrastive learning strategy to align multimodal features, iii) superpoint temporal consistency to maintain stable representations across time, and iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection. Extensive experiments on eleven large-scale multi-modal datasets highlight our superior performance, demonstrating the adaptability, efficiency, and robustness in real-world autonomous driving scenarios.

machine learning, natural language, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2501.04005

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.82)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

4D Contrastive Superflows are Dense 3D Representation Learners

Xu, Xiang, Kong, Lingdong, Shuai, Hui, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei, Liu, Qingshan

arXiv.org Artificial IntelligenceJul-9-2024

In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. SuperFlow stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances the alignment of the knowledge distilled from camera views. Extensive comparative and ablation studies across 11 heterogeneous LiDAR datasets validate our effectiveness and superiority. Additionally, we observe several interesting emerging properties by scaling up the 2D and 3D backbones during pretraining, shedding light on the future research of 3D foundation models for LiDAR-based perception.

computer vision, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2407.0619

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.67)
Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

Sun, Jiahao, Qing, Chunmei, Xu, Xiang, Kong, Lingdong, Liu, Youquan, Li, Li, Zhu, Chenming, Zhang, Jingwei, Xiao, Zeqi, Chen, Runnan, Wang, Tai, Zhang, Wenwei, Chen, Kai

arXiv.org Artificial IntelligenceMay-30-2024

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient training and evaluation of state-of-the-art LiDAR segmentation models. We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and generalization. Additionally, the toolbox provides support for multiple leading sparse convolution backends, optimizing computational efficiency and performance. By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application. Our extensive benchmark experiments on widely-used datasets demonstrate the effectiveness of the toolbox. The codebase and trained models have been publicly available, promoting further research and innovation in the field of LiDAR segmentation for autonomous driving.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2405.1487

Country:

Europe (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.70)
Transportation (0.56)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization

Chen, Gaoming, Song, Kun, Xu, Xiang, Liu, Wenhang, Xiong, Zhenhua

arXiv.org Artificial IntelligenceMay-19-2024

Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when robots are equipped only with RGB cameras. To enhance the precision of localization, this paper proposes a novel rendezvous-based hierarchical architecture for mutual localization (RHAML). Firstly, to learn multi-scale robot features, anisotropic convolutions are introduced into the network, yielding initial localization results. Then, the iterative refinement module with rendering is employed to adjust the observed robot poses. Finally, the pose graph is conducted to globally optimize all localization results, which takes into account multi-frame observations. Therefore, a flexible architecture is provided that allows for the selection of appropriate modules based on requirements. Simulations demonstrate that RHAML effectively addresses the problem of multi-robot mutual localization, achieving translation errors below 2 cm and rotation errors below 0.5 degrees when robots exhibit 5 m of depth variation. Moreover, its practical utility is validated by applying it to map fusion when multi-robots explore unknown environments.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

2405.11726

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Kong, Lingdong, Xu, Xiang, Ren, Jiawei, Zhang, Wenwei, Pan, Liang, Chen, Kai, Ooi, Wei Tsang, Liu, Ziwei

arXiv.org Artificial IntelligenceMay-8-2024

Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.

artificial intelligence, machine learning, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2405.05258

Country: Asia (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

Kong, Lingdong, Xu, Xiang, Cen, Jun, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei

arXiv.org Artificial IntelligenceMar-25-2024

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering insightful phenomena that cope with both the aleatoric and epistemic uncertainties in 3D scene understanding. We discover that despite achieving impressive levels of accuracy, existing models frequently fail to provide reliable uncertainty estimates -- a pitfall that critically undermines their applicability in safety-sensitive contexts. Through extensive analysis of key factors such as network capacity, LiDAR representations, rasterization resolutions, and 3D data augmentation techniques, we correlate these aspects directly with the model calibration efficacy. Furthermore, we introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration. Extensive experiments across a wide range of configurations validate the superiority of our method. We hope this work could serve as a cornerstone for fostering reliable 3D scene understanding. Code and benchmark toolkits are publicly available.

artificial intelligence, computer vision, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.1701

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Nature-Guided Cognitive Evolution for Predicting Dissolved Oxygen Concentrations in North Temperate Lakes

Yu, Runlong, Ladwig, Robert, Xu, Xiang, Zhu, Peijun, Hanson, Paul C., Xie, Yiqun, Jia, Xiaowei

arXiv.org Artificial IntelligenceFeb-15-2024

Predicting dissolved oxygen (DO) concentrations in north temperate lakes requires a comprehensive study of phenological patterns across various ecosystems, which highlights the significance of selecting phenological features and feature interactions. Process-based models are limited by partial process knowledge or oversimplified feature representations, while machine learning models face challenges in efficiently selecting relevant feature interactions for different lake types and tasks, especially under the infrequent nature of DO data collection. In this paper, we propose a Nature-Guided Cognitive Evolution (NGCE) strategy, which represents a multi-level fusion of adaptive learning with natural processes. Specifically, we utilize metabolic process-based models to generate simulated DO labels. Using these simulated labels, we implement a multi-population cognitive evolutionary search, where models, mirroring natural organisms, adaptively evolve to select relevant feature interactions within populations for different lake types and tasks. These models are not only capable of undergoing crossover and mutation mechanisms within intra-populations but also, albeit infrequently, engage in inter-population crossover. The second stage involves refining these models by retraining them with real observed labels. We have tested the performance of our NGCE strategy in predicting daily DO concentrations across a wide range of lakes in the Midwest, USA. These lakes, varying in size, depth, and trophic status, represent a broad spectrum of north temperate lakes. Our findings demonstrate that NGCE not only produces accurate predictions with few observed labels but also, through gene maps of models, reveals sophisticated phenological patterns of different lakes.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.18923

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.49)
Food & Agriculture (0.46)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback