ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map

Ye, Yilin, Xiao, Shishi, Zeng, Xingchen, Zeng, Wei

Jul-17-2024–arXiv.org Artificial Intelligence

Multi-modal embeddings form the foundation for vision-language models, such as CLIP embeddings, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross-modal features, resulting in decreased model performance and diminished generalization. To address this problem, we design ModalChorus, an interactive system for visual probing and alignment of multi-modal embeddings. ModalChorus primarily offers a two-stage process: 1) embedding probing with Modal Fusion Map (MFM), a novel parametric dimensionality reduction method that integrates both metric and nonmetric objectives to enhance modality fusion; and 2) embedding alignment that allows users to interactively articulate intentions for both point-set and set-set alignments. Quantitative and qualitative comparisons for CLIP embeddings with existing dimensionality reduction (e.g., t-SNE and MDS) and data fusion (e.g., data context map) methods demonstrate the advantages of MFM in showcasing cross-modal features over common vision-language datasets. Case studies reveal that ModalChorus can facilitate intuitive discovery of misalignment and efficient re-alignment in scenarios ranging from zero-shot classification to cross-modal retrieval and generation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-17-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (1.00)

Industry:
- Transportation (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks (0.93)
      - Statistical Learning (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Data Science (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found