Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

May-28-2025, 10:11:32 GMT–Neural Information Processing Systems

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

May-28-2025, 10:11:32 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America > United States (0.46)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Nuclear Medicine (1.00)
  - Therapeutic Area (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.93)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Data Science (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)