The Multi-Faceted Monosemanticity in Multimodal Representations

Yan, Hanqi, Cui, Xiangxiang, Yin, Lu, Liang, Paul Pu, He, Yulan, Wang, Yifei

Feb-16-2025–arXiv.org Artificial Intelligence

In this paper, we leverage recent advancements in feature monosemanticity to extract interpretable features from deep multimodal models, offering a data-driven understanding of modality gaps. Specifically, we investigate CLIP (Contrastive Language-Image Pretraining), a prominent visual-language representation model trained on extensive image-text pairs. Building upon interpretability tools developed for single-modal models, we extend these methodologies to assess multi-modal interpretability of CLIP features. Additionally, we introduce the Modality Dominance Score (MDS) to attribute the interpretability of each feature to its respective modality. Next, we transform CLIP features into a more interpretable space, enabling us to categorize them into three distinct classes: vision features (single-modal), language features (single-modal), and visual-language features (cross-modal). Our findings reveal that this categorization aligns closely with human cognitive understandings of different modalities. We also demonstrate significant use cases of this modality-specific features including detecting gender bias, adversarial attack defense and text-to-image model editing. These results indicate that large-scale multimodal models, equipped with task-agnostic interpretability tools, offer valuable insights into key connections and distinctions between different modalities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Feb-16-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England (0.14)
- North America > United States (0.28)

Genre:
- Instructional Material (0.93)
- Research Report > New Finding (0.66)

Industry:
- Government > Military (0.35)
- Information Technology > Security & Privacy (0.35)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)
    - Natural Language > Large Language Model (0.68)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (0.94)