AITopics | Yu, Haibao

Collaborating Authors

Yu, Haibao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Wang, Jiahao, Cao, Xiangyu, Zhong, Jiaru, Zhang, Yuner, Yu, Haibao, He, Lei, Xu, Shaobing

arXiv.org Artificial IntelligenceMar-10-2025

Despite significant advancements, autonomous driving systems continue to struggle with occluded objects and long-range detection due to the inherent limitations of single-perspective sensing. Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations. However, progress in this emerging field has been hindered by the absence of public datasets and standardized evaluation benchmarks. To address this gap, this paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions: (1) Griffin, a large-scale multi-modal dataset featuring over 200 dynamic scenes (30k+ frames) with varied UAV altitudes (20-60m), diverse weather conditions, and occlusion-aware 3D annotations, enhanced by CARLA-AirSim co-simulation for realistic UAV dynamics; (2) A unified benchmarking framework for aerial-ground cooperative detection and tracking tasks, including protocols for evaluating communication efficiency, latency tolerance, and altitude adaptability; (3) AGILE, an instance-level intermediate fusion baseline that dynamically aligns cross-view features through query-based interaction, achieving an advantageous balance between communication overhead and perception accuracy. Extensive experiments prove the effectiveness of aerial-ground cooperative perception and demonstrate the direction of further research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.06983

Country:

Asia > China (0.29)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.49)
Information Technology > Robotics & Automation (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.93)

Add feedback

LiDAR-based End-to-end Temporal Perception for Vehicle-Infrastructure Cooperation

Yang, Zhenwei, Mao, Jilei, Yang, Wenxian, Ai, Yibo, Kong, Yu, Yu, Haibao, Zhang, Weidong

arXiv.org Artificial IntelligenceNov-22-2024

Temporal perception, the ability to detect and track objects over time, is critical in autonomous driving for maintaining a comprehensive understanding of dynamic environments. However, this task is hindered by significant challenges, including incomplete perception caused by occluded objects and observational blind spots, which are common in single-vehicle perception systems. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Infrastructure Cooperation (VIC). LET-VIC leverages Vehicle-to-Everything (V2X) communication to enhance temporal perception by fusing spatial and temporal data from both vehicle and infrastructure sensors. First, it spatially integrates Bird's Eye View (BEV) features from vehicle-side and infrastructure-side LiDAR data, creating a comprehensive view that mitigates occlusions and compensates for blind spots. Second, LET-VIC incorporates temporal context across frames, allowing the model to leverage historical data for enhanced tracking stability and accuracy. To further improve robustness, LET-VIC includes a Calibration Error Compensation (CEC) module to address sensor misalignments and ensure precise feature alignment. Experiments on the V2X-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models, achieving at least a 13.7% improvement in mAP and a 13.1% improvement in AMOTA without considering communication delays. This work offers a practical solution and a new research direction for advancing temporal perception in autonomous driving through vehicle-infrastructure cooperation.

artificial intelligence, lidar-based end-to-end temporal perception, vehicle-infrastructure cooperation

arXiv.org Artificial Intelligence

2411.14927

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback

End-to-End Autonomous Driving through V2X Cooperation

Yu, Haibao, Yang, Wenxian, Zhong, Jiaru, Yang, Zhenwei, Fan, Siqi, Luo, Ping, Nie, Zaiqing

arXiv.org Artificial IntelligenceApr-19-2024

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X.

artificial intelligence, autonomous driving, univ2x, (13 more...)

arXiv.org Artificial Intelligence

2404.00717

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.96)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Hao, Ruiyang, Fan, Siqi, Dai, Yingru, Zhang, Zhenlin, Li, Chenxi, Wang, Yuntian, Yu, Haibao, Yang, Wenxian, Yuan, Jirui, Nie, Zaiqing

arXiv.org Artificial IntelligenceMar-31-2024

The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper.

artificial intelligence, machine learning, perception, (16 more...)

arXiv.org Artificial Intelligence

2403.10145

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Add feedback

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Mu, Yao, Chen, Junting, Zhang, Qinglong, Chen, Shoufa, Yu, Qiaojun, Ge, Chongjian, Chen, Runjian, Liang, Zhixuan, Hu, Mengkang, Tao, Chaofan, Sun, Peize, Yu, Haibao, Yang, Chao, Shao, Wenqi, Wang, Wenhai, Dai, Jifeng, Qiao, Yu, Ding, Mingyu, Luo, Ping

arXiv.org Artificial IntelligenceFeb-25-2024

Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various scenarios. In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints, and applies code generation to introduce generalization ability across various robotics platforms. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning. Extensive experiments demonstrate that RoboCodeX achieves state-of-the-art performance in both simulators and real robots on four different kinds of manipulation tasks and one navigation task.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.16117

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.82)

Add feedback

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting

Yu, Haibao, Yang, Wenxian, Ruan, Hongzhi, Yang, Zhenwei, Tang, Yingjuan, Gao, Xu, Hao, Xin, Shi, Yifeng, Pan, Yifeng, Sun, Ning, Song, Juan, Yuan, Jirui, Luo, Ping, Nie, Zaiqing

arXiv.org Artificial IntelligenceMay-10-2023

Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frames, trajectories, vector maps, and traffic lights captured from natural scenery. V2X-Seq comprises two parts: the sequential perception dataset, which includes more than 15,000 frames captured from 95 scenarios, and the trajectory forecasting dataset, which contains about 80,000 infrastructure-view scenarios, 80,000 vehicle-view scenarios, and 50,000 cooperative-view scenarios captured from 28 intersections' areas, covering 672 hours of data. Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. We also provide benchmarks for the introduced tasks. Find data, code, and more up-to-date information at \href{https://github.com/AIR-THU/DAIR-V2X-Seq}{https://github.com/AIR-THU/DAIR-V2X-Seq}.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.05938

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback