A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
Zhang, Yi, Haß, Erik Leo, Chao, Kuo-Yi, Petrovic, Nenad, Song, Yinglei, Wu, Chengdong, Knoll, Alois
–arXiv.org Artificial Intelligence
Chair of Robotics, Artificial Intelligence and Embedded Systems T echnical University of Munich (TUM) Munich, Germany {yi1228.zhang, erik-leo.hass, Abstract --Autonomous driving systems face significant challenges in achieving human-like adaptability, robustness, and interpretability in complex, open-world environments. These challenges stem from fragmented architectures, limited generalization to novel scenarios, and insufficient semantic extraction from perception. T o address these limitations, we propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture, specifically a GPT -4.1-powered reasoning core. This framework unifies low-level sensory processing with high-level contextual reasoning, tightly coupling perception with natural language-based semantic understanding and decision-making to enable context-aware, explainable, and safety-bounded autonomous driving.
arXiv.org Artificial Intelligence
Aug-1-2025
- Country:
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.44)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Automobiles & Trucks (0.86)
- Information Technology > Robotics & Automation (0.86)
- Transportation > Ground
- Road (0.96)
- Technology: