cooperative perception
Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang
Mean A verage Precision (mAP). For VIC3D object detection, we focus on the obstacles around the ego vehicle. There are two metrics used for evaluation: BEV@mAP and 3D@mAP . BEV@mAP evaluates the 3D boxes in the bird's-eye view and ignores the In our implementation, we ignore the transmission cost of calibration files and timestamps. For early fusion, we calculate the transmission cost of transmitting raw data.
Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang
Mean A verage Precision (mAP). For VIC3D object detection, we focus on the obstacles around the ego vehicle. There are two metrics used for evaluation: BEV@mAP and 3D@mAP . BEV@mAP evaluates the 3D boxes in the bird's-eye view and ignores the In our implementation, we ignore the transmission cost of calibration files and timestamps. For early fusion, we calculate the transmission cost of transmitting raw data.
DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
Li, Lantao, Yang, Kang, Song, Rui, Sun, Chen
Abstract-- Cooperative perception enabled by V ehicle-to-Everything communication has shown great promise in enhancing situational awareness for autonomous vehicles and other mobile robotic platforms. Despite recent advances in perception backbones and multi-agent fusion, real-world deployments remain challenged by hard detection cases, exemplified by partial detections and noise accumulation which limit downstream detection accuracy. This work presents Diffusion on Reinforced Cooperative Perception (DRCP), a real-time de-ployable framework designed to address aforementioned issues in dynamic driving environments. The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions. Code will be released in late 2025. I. INTRODUCTION Robotic systems such as autonomous vehicles and mobile agents rely heavily on perception to understand their surroundings and make informed decisions.
- Information Technology (0.68)
- Transportation (0.46)
V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Chiu, Hsu-kuang, Hachiuma, Ryo, Wang, Chien-Yi, Wang, Yu-Chiang Frank, Chen, Min-Hung, Smith, Stephen F.
Abstract-- Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. V ehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT -QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Today's autonomous vehicles rely mainly on mounted cameras or LiDAR sensors to perceive the world, understand the dynamic surrounding scenes, and take driving decisions over time. Inherently such reliance on the vehicle's local sensors can be limiting, particularly in situations where vehicles and other potential obstacles are occluded by other large nearby objects, such as buses or trucks.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition
Hao, Ruiyang, Yu, Haibao, Zhong, Jiaru, Wang, Chuanye, Wang, Jiahao, Kan, Yiming, Yang, Wenxian, Fan, Siqi, Yin, Huilin, Qiu, Jianing, Mu, Yao, Sun, Jiankai, Chen, Li, Zimmer, Walter, Zhang, Dandan, Zhang, Shanghang, Schwager, Mac, Luo, Ping, Nie, Zaiqing
With the rapid advancement of autonomous driving technology, vehicle-to-everything (V2X) communication has emerged as a key enabler for extending perception range and enhancing driving safety by providing visibility beyond the line of sight. However, integrating multi-source sensor data from both ego-vehicles and infrastructure under real-world constraints, such as limited communication bandwidth and dynamic environments, presents significant technical challenges. T o facilitate research in this area, we organized the End-to-End Autonomous Driving through V2X Cooperation Challenge, which features two tracks: cooperative temporal perception and cooperative end-to-end planning. Built on the UniV2X framework and the V2X-Seq-SPD dataset, the challenge attracted participation from over 30 teams worldwide and established a unified benchmark for evaluating cooperative driving systems. This paper describes the design and outcomes of the challenge, highlights key research problems including bandwidth-aware fusion, robust multi-agent planning, and heterogeneous sensor integration, and analyzes emerging technical trends among top-performing solutions. By addressing practical constraints in communication and data fusion, the challenge contributes to the development of scalable and reliable V2X-cooperative autonomous driving systems.
- Asia > China > Hong Kong (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Texas (0.04)
- (5 more...)
- Overview (0.93)
- Research Report (0.64)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
TruckV2X: A Truck-Centered Perception Dataset
Xie, Tenghui, Song, Zhiying, Wen, Fuxi, Li, Jun, Liu, Guangzhao, Zhao, Zijian
--Autonomous trucking offers significant benefits, such as improved safety and reduced costs, but faces unique perception challenges due to trucks' large size and dynamic trailer movements. These challenges include extensive blind spots and occlusions that hinder the truck's perception and the capabilities of other road users. T o address these limitations, cooperative perception emerges as a promising solution. However, existing datasets predominantly feature light vehicle interactions or lack multi-agent configurations for heavy-duty vehicle scenarios. T o bridge this gap, we introduce TruckV2X, the first large-scale truck-centered cooperative perception dataset featuring multi-modal sensing (LiDAR and cameras) and multi-agent cooperation (tractors, trailers, CA Vs, and RSUs). We further investigate how trucks influence collaborative perception needs, establishing performance benchmarks while suggesting research priorities for heavy vehicle perception. The dataset provides a foundation for developing cooperative perception systems with enhanced occlusion handling capabilities, and accelerates the deployment of multi-agent autonomous trucking systems. UTONOMOUS trucking is expected to benefit the logistics industry in improved road safety, reduced operational costs, and solutions to driver shortages [1].
- North America > United States > District of Columbia > Washington (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Freight & Logistics Services (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF
The proposal introduces an innovative drone swarm perception system that aims to solve problems related to computational limitations and low-bandwidth communication, and real-time scene reconstruction. The framework enables efficient multi-agent 3D/4D scene synthesis through federated learning of shared diffusion model and YOLOv12 lightweight semantic extraction and local NeRF updates while maintaining privacy and scalability. The framework redesigns generative diffusion models for joint scene reconstruction, and improves cooperative scene understanding, while adding semantic-aware compression protocols. The approach can be validated through simulations and potential real-world deployment on drone testbeds, positioning it as a disruptive advancement in multi-agent AI for autonomous systems.
- Research Report (0.64)
- Overview (0.46)
- Information Technology (0.94)
- Transportation > Ground > Road (0.47)
HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors
Wei, Chuheng, Qin, Ziye, Zimmer, Walter, Wu, Guoyuan, Barth, Matthew J.
Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance robustness across different configurations, we further implement a cooperative learning strategy that dynamically adjusts fusion type based on available modalities. Experiments on the real-world TUMTraf-V2X dataset demonstrate that HeCoFuse achieves 43.22% 3D mAP under the full sensor configuration (LC+LC), outperforming the CoopDet3D baseline by 1.17%, and reaches an even higher 43.38% 3D mAP in the L+LC scenario, while maintaining 3D mAP in the range of 21.74% to 43.38% across nine heterogeneous sensor configurations. These results, validated by our first-place finish in the CVPR 2025 DriveX challenge, establish HeCoFuse as the current state-of-the-art on TUM-Traf V2X dataset while demonstrating robust performance across diverse sensor deployments.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Transportation (0.69)
- Automobiles & Trucks (0.46)
CoInfra: A Large-Scale Cooperative Infrastructure Perception System and Dataset in Adverse Weather
Ning, Minghao, Yang, Yufeng, Shu, Keqi, Huang, Shucheng, Zhong, Jiaming, Salehi, Maryam, Rahmani, Mahdi, Lu, Yukun, Sun, Chen, Saleh, Aladdin, Hashemi, Ehsan, Khajepour, Amir
We present CoInfra, a large-scale cooperative infrastructure perception system and dataset designed to advance robust multi-agent perception under real-world and adverse weather conditions. The CoInfra system includes 14 fully synchronized sensor nodes, each equipped with dual RGB cameras and a LiDAR, deployed across a shared region and operating continuously to capture all traffic participants in real-time. A robust, delay-aware synchronization protocol and a scalable system architecture that supports real-time data fusion, OTA management, and remote monitoring are provided in this paper. On the other hand, the dataset was collected in different weather scenarios, including sunny, rainy, freezing rain, and heavy snow and includes 195k LiDAR frames and 390k camera images from 8 infrastructure nodes that are globally time-aligned and spatially calibrated. Furthermore, comprehensive 3D bounding box annotations for five object classes (i.e., car, bus, truck, person, and bicycle) are provided in both global and individual node frames, along with high-definition maps for contextual understanding. Baseline experiments demonstrate the trade-offs between early and late fusion strategies, the significant benefits of HD map integration are discussed. By openly releasing our dataset, codebase, and system documentation at https://github.com/NingMingHao/CoInfra, we aim to enable reproducible research and drive progress in infrastructure-supported autonomous driving, particularly in challenging, real-world settings.
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- Asia > China > Hong Kong (0.04)
- (8 more...)
- Information Technology (1.00)
- Transportation > Ground > Road (0.89)
- Information Technology > Data Science > Data Integration (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- (3 more...)
AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration
Gao, Xiangbo, Wu, Yuheng, Yang, Fengze, Luo, Xuewen, Wu, Keshu, Chen, Xinghao, Wang, Yuping, Liu, Chenxi, Zhou, Yang, Tu, Zhengzhong
While multi-vehicular collaborative driving demonstrates clear advantages over single-vehicle autonomy, traditional infrastructure-based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X-Perception, a large-scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative or complement to fixed Road-Side Units (RSUs). Drones offer unique advantages over ground-based perception: complementary bird's-eye-views that reduce occlusions, dynamic positioning capabilities that enable hovering, patrolling, and escorting navigation rules, and significantly lower deployment costs compared to fixed infrastructure. Our dataset comprises 6.73 hours of drone-assisted driving scenarios across urban, suburban, and rural environments with varied weather and lighting conditions. The AirV2X-Perception dataset facilitates the development and standardized evaluation of Vehicle-to-Drone (V2D) algorithms, addressing a critical gap in the rapidly expanding field of aerial-assisted autonomous driving systems. The dataset and development kits are open-sourced at https://github.com/taco-group/AirV2X-Perception.
- North America > United States > Utah (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
- Energy (1.00)
- Transportation > Ground > Road (0.89)
- Automobiles & Trucks (0.89)
- Information Technology > Robotics & Automation (0.69)