SEAL: Vision-Language Model-Based Safe End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling

You, Junwei, Li, Pei, Jiang, Zhuoyu, Huang, Zilin, Gan, Rui, Shi, Haotian, Ran, Bin

Jul-8-2025–arXiv.org Artificial Intelligence

Autonomous driving has the potential to revolutionize transportation systems by operating with minimal or no human intervention, thereby eliminating many human errors. However, it faces significant safety challenges in diverse traffic scenarios, extreme weather conditions, and complex interactions with human-driven vehicles (Zhou et al. (2023); Xu et al. (2025)). Existing studies have suggested that extreme weather conditions, including heavy rain, snow, and fog, can significantly reduce the ability of autonomous driving in detecting objects, planning trajectories, and making driving decisions (Zang et al. (2019), Mehra et al. (2020)), posing significant safety concerns. Therefore, there is an urgent need to develop autonomous driving technologies that are robust to rare and diverse environmental conditions to ensure safe and reliable operation. End-to-end autonomous driving has become the prevailing paradigm for high-level autonomy, offering a unified learning-based framework that directly maps raw sensory inputs to driving actions. Compared to traditional modular pipelines that decompose driving into discrete stages (Figure 1a), end-to-end frameworks simplify system design, reduce interface mismatches, and enable global optimization of the driving policy. Among recent advances, vision-language models (VLMs) have emerged as powerful backbones for end-to-end reasoning (Xiao et al. (2024); Chen et al. (2024); Feng et al. (2025)). By jointly encoding visual scenes and natural language prompts, VLMs offer rich semantic grounding and flexible context understanding, making them a natural fit for autonomous driving tasks. Leveraging these capabilities, recent works have introduced VLM-based end-to-end pipelines (Figure 1b), which show promising improvements in semantic generalization and reasoning ability (Hwang et al. (2024); Xing et al. (2025); Zhou et al. (2025); Huang et al. (2024); Tian et al. (2024b)).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jul-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States
  - Wisconsin (0.28)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Autonomous Vehicles (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found