DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Hou, Xinmeng, Wang, Wuqi, Yang, Long, Lin, Hao, Feng, Jinglun, Min, Haigen, Zhao, Xiangmo
–arXiv.org Artificial Intelligence
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving Xinmeng Hou 2,, Wuqi Wang 1,, Long Y ang 1, Hao Lin 3, Jinglun Feng 4,, Haigen Min 1,, Xiangmo Zhao 1 Abstract -- We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent uniquely integrates diverse sensor modalities--including camera, LiDAR, GPS, and IMU--with LLMdriven analytical processes structured across specialized agents. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments on challenging autonomous driving datasets demonstrate that DriveAgent is achieving superior performance on multiple metrics against baseline methods. These results validate the efficacy of the proposed LLM-driven multi-agent sensor fusion framework, underscoring its potential to substantially enhance the robustness and reliability of autonomous driving systems. 1 I. I NTRODUCTION Promising progress has been made in autonomous driving (AD) in recent years; however, some challenging problems in AD have yet to be solved, especially under dynamic, mul-timodal environments, such as contextual understanding and interpretability [1]. Commonly adopted AD architectures, whether modular or end-to-end, often struggle to integrate insights across heterogeneous sensor modalities--such as cameras, LiDAR, IMU and GPS--especially in edge cases where visual information is ambiguous or missing [2]. 1 Wuqi Wang, Long Y ang, Haigen Min, and Xiangmo Zhao are with Chang'an University, Xi'an, Shaanxi, China. 2 Xinmeng Hou is with Chang'an University, Xi'an, Shaanxi, China and Agency for Science, Technology and Research (A*ST AR), Singapore.
arXiv.org Artificial Intelligence
May-6-2025
- Country:
- Asia > China > Shaanxi Province > Xi'an (0.44)
- Genre:
- Research Report (0.50)
- Industry:
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Ground
- Road (1.00)
- Technology: