DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving

Wasif, Dawood, Moore, Terrence J, Reddy, Chandan K, Cho, Jin-Hee

Jun-3-2025–arXiv.org Artificial Intelligence

Recent advances in autonomous vehicles have shifted development from rigid pipelines to end-to-end neural policies mapping raw sensor streams directly to control commands [1-3]. While these models offer streamlined architectures and strong benchmark performance, they raise critical deployment concerns. Their internal logic is opaque, complicating validation in safety-critical settings. They struggle to generalize to rare events like severe weather or infrastructure damage and lack formal guarantees on kinematic properties such as speed limits and lane-keeping. Further, they provide no natural interface for human oversight or explanation. These challenges motivate frameworks that combine deep network expressiveness with transparency, robustness, and provable safety. Meanwhile, Large Language Models (LLMs) and Vision Language Models (VLMs) have demonstrated human-level reasoning and visual grounding [4-6]. Recent works like VLM-SR (Shaped Rewards) [7], VLM-RM (Reward Models) [8], and RoboCLIP (Language-Conditioned Robot Learning via Contrastive Language-Image Pretraining) [9] inject semantic feedback into Reinforcement Learning (RL), but rely on static prompts unsuited to evolving road conditions and overlook vehicle dynamics.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Virginia (0.04)

Genre:
- Research Report (0.64)

Industry:
- Transportation > Ground > Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)
    - Neural Networks > Deep Learning (0.47)
    - Reinforcement Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Robots > Autonomous Vehicles (1.00)