AITopics | eye-gaze data

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Neural Information Processing SystemsMar-17-2026, 23:45:18 GMT

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

artificial intelligence, eye-gaze data, proceedings, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.83)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.60)
Information Technology > Artificial Intelligence (0.40)

Add feedback

Eye-gazeGuidedMulti-modalAlignmentforMedical RepresentationLearning

Neural Information Processing SystemsFeb-7-2026, 17:56:16 GMT

This data-reliance may lead to low generalization of the learned alignment relationships. Inthis work, we propose theEye-gazeGuided Multi-modalAlignment (EGMA) frameworktoharness eye-gazedataforbetter alignment of medical visual and textual features.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Chong Ma

Neural Information Processing SystemsOct-9-2025, 18:19:19 GMT

This data-reliance may lead to low generalization of the learned alignment relationships.

dataset, eye-gaze data, radiologist, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Massachusetts (0.04)
Europe > Switzerland (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Information Technology (0.93)
Health & Medicine > Nuclear Medicine (0.73)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

184260348236f9554fe9375772ff966e-Reviews.html

Neural Information Processing SystemsOct-3-2025, 07:07:01 GMT

"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1139" "Title:","Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a method for action detection (localization and classification of actions) using weakly supervised information (action labels + eye gaze information, no explicit definition of bounding boxes). Overall, the spatio-temporal search (a huge spatio-temporal space) is done using dynamic programming and a max-path algorithm. Gaze information is introduced into framework through a loss which acounts for gaze density at a given location. QUALITY: The paper seems technically sound and makes for a nice study given gaze information.

action localization, information, localization, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.24)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Pan, Yi, Jiang, Hanqi, Chen, Junhao, Li, Yiwei, Zhao, Huaqin, Zhou, Yifan, Shu, Peng, Wu, Zihao, Liu, Zhengliang, Zhu, Dajiang, Li, Xiang, Abate, Yohannes, Liu, Tianming

arXiv.org Artificial IntelligenceOct-29-2024

Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, neuromorphic computing for the medical imaging domain remains underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture tailored for clinical tasks that incorporates eye-gaze data to guide the model's attention to the diagnostically relevant regions in medical images. Our developed approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image prediction tasks but also enhances clinical relevance through multi-modal information alignment. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for applying neuromorphic computing in healthcare.

energy efficiency, neural network, transformer, (14 more...)

arXiv.org Artificial Intelligence

2410.09674

Country:

North America > United States > Georgia > Clarke County > Athens (0.15)
North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Massachusetts (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Ma, Chong, Jiang, Hanqi, Chen, Wenting, Li, Yiwei, Wu, Zihao, Yu, Xiaowei, Liu, Zhengliang, Guo, Lei, Zhu, Dajiang, Zhang, Tuo, Shen, Dinggang, Liu, Tianming, Li, Xiang

arXiv.org Artificial IntelligenceJun-13-2024

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

dataset, eye-gaze data, radiologist, (15 more...)

arXiv.org Artificial Intelligence

2403.12416

Country: