Strategic Fusion of Vision Language Models: Shapley-Credited Context-Aware Dawid-Skene for Multi-Label Tasks in Autonomous Driving

Feng, Yuxiang, Zhang, Keyang, Ouchouid, Hassane, Kaniamparambil, Ashwil, Souflas, Ioannis, Angeloudis, Panagiotis

Oct-2-2025–arXiv.org Artificial Intelligence

Abstract-- Large vision-language models (VLMs) are increasingly used in autonomous-vehicle (A V) stacks, but hallucinations limit their reliability in safety-critical pipelines. It learns per-model, per-label, context-conditioned reliabilities from labelled history and, at inference, converts each model's report into an agreement-guardrailed log-likelihood ratio that is combined with a contextual prior and a public reputation state updated using Shapley-based team credit. The result is calibrated, thresholded posteriors that (i) amplify agreement among reliable models, (ii) preserve uniquely correct single-model signals, and (iii) adapt to drift. T o specialise general VLMs, we curate 1,000 real-world dashcam clips with structured annotations (scene description, manoeuvre recommendation, rationale) using an automatic pipeline that fuses HDD ground-truth, vehicle kinematics, and YOLOv11 + BoT -SORT tracking, guided by a three-step chain-of-thought prompt; three heterogeneous VLMs are then fine-tuned with LoRA. We evaluate with Hamming distance, Micro-/Macro-F1, and average per-video latency. Empirically, the proposed method achieves a 23% reduction in Hamming distance, 55% improvement in Macro-F1, and 47% improvement in Micro-F1 when comparing with the best single model, demonstrating VLM fusion as a calibrated, interpretable, and robust decision-support mechanism in A V pipelines.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-2-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report (0.82)

Industry:
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.84)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Autonomous Vehicles (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found