A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving

Zhang, Yi, Haß, Erik Leo, Chao, Kuo-Yi, Petrovic, Nenad, Song, Yinglei, Wu, Chengdong, Knoll, Alois

arXiv.org Artificial Intelligence 

Chair of Robotics, Artificial Intelligence and Embedded Systems T echnical University of Munich (TUM) Munich, Germany {yi1228.zhang, erik-leo.hass, Abstract --Autonomous driving systems face significant challenges in achieving human-like adaptability, robustness, and interpretability in complex, open-world environments. These challenges stem from fragmented architectures, limited generalization to novel scenarios, and insufficient semantic extraction from perception. T o address these limitations, we propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture, specifically a GPT -4.1-powered reasoning core. This framework unifies low-level sensory processing with high-level contextual reasoning, tightly coupling perception with natural language-based semantic understanding and decision-making to enable context-aware, explainable, and safety-bounded autonomous driving.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found