3b6d18473eb525df8008868f1390cc8c-Paper-Datasets_and_Benchmarks_Track.pdf

Jun-16-2026, 13:26:42 GMT–Neural Information Processing Systems

Spurious correlations occur when models rely on non-essential features that coincidentally co-vary with target labels, leading to incorrect reasoning under distribution shift. We consider spurious correlations in Large Vision Language Models (LVLMs) pretrained on extensive and diverse datasets without explicit task supervision. We develop a benchmark by sourcing GPT-4o errors on real-world visual-question-answering (VQA) benchmarks, then curating a subset through LVLM-human annotation and synthetic counterfactual evaluation to identify errors caused by spurious correlations. This process yields SpuriVerse, a novel benchmark comprised of 124 distinct types of spurious correlations extracted from real-world datasets, each containing 1 realistic and 10 synthetic VQA samples for a total of 1364 multiple choice questions. We evaluate 15 open and closed-source LVLMs on SpuriVerse, finding that even state-of-the-art closed-source models struggle significantly, achieving at best only 35.0% accuracy. Fine-tuning on synthetic examples that emphasize the spurious correlation improves performance to 78.4%, suggesting that training on diverse spurious patterns generalizes to unseen situations: models appear to learn to avoid "shortcuts" and attend to the overall image context.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-16-2026, 13:26:42 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Government
  - Military (0.46)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found