Seeing Sound Hearing Sight Uncovering Modality Bias and Conflict of AI models in Sound Localization

Jun-22-2026, 22:33:26 GMT–Neural Information Processing Systems

Imagine hearing a dog bark and instinctively turning toward the sound--only to find a parked car, while a silent dog sits nearby. Such moments of sensory conflict challenge perception, yet humans flexibly resolve these discrepancies, prioritizing auditory cues over misleading visuals to accurately localize sounds. Despite the rapid advancement of multimodal AI models that integrate vision and sound, little is known about how these systems handle cross-modal conflicts or whether they favor one modality over another. Here, we systematically and quantitatively examine modality bias and conflict resolution in AI models for Sound Source Localization (SSL). We evaluate a wide range of state-of-the-art multimodal models and compare them against human performance in psychophysics experiments spanning six audiovisual conditions, including congruent, conflicting, and absent visual and audio cues.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-22-2026, 22:33:26 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.46)
- North America > United States (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Health & Medicine (0.93)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.67)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Cognitive Science (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found