DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving

Hou, Xinmeng, Wang, Wuqi, Yang, Long, Lin, Hao, Feng, Jinglun, Min, Haigen, Zhao, Xiangmo

May-6-2025–arXiv.org Artificial Intelligence

DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving Xinmeng Hou 2,, Wuqi Wang 1,, Long Y ang 1, Hao Lin 3, Jinglun Feng 4,, Haigen Min 1,, Xiangmo Zhao 1 Abstract -- We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent uniquely integrates diverse sensor modalities--including camera, LiDAR, GPS, and IMU--with LLMdriven analytical processes structured across specialized agents. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments on challenging autonomous driving datasets demonstrate that DriveAgent is achieving superior performance on multiple metrics against baseline methods. These results validate the efficacy of the proposed LLM-driven multi-agent sensor fusion framework, underscoring its potential to substantially enhance the robustness and reliability of autonomous driving systems. 1 I. I NTRODUCTION Promising progress has been made in autonomous driving (AD) in recent years; however, some challenging problems in AD have yet to be solved, especially under dynamic, mul-timodal environments, such as contextual understanding and interpretability [1]. Commonly adopted AD architectures, whether modular or end-to-end, often struggle to integrate insights across heterogeneous sensor modalities--such as cameras, LiDAR, IMU and GPS--especially in edge cases where visual information is ambiguous or missing [2]. 1 Wuqi Wang, Long Y ang, Haigen Min, and Xiangmo Zhao are with Chang'an University, Xi'an, Shaanxi, China. 2 Xinmeng Hou is with Chang'an University, Xi'an, Shaanxi, China and Agency for Science, Technology and Research (A*ST AR), Singapore.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-6-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Shaanxi Province > Xi'an (0.44)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Autonomous Vehicles (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found