Goto

Collaborating Authors

 eye-gaze data


Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Neural Information Processing Systems

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.




184260348236f9554fe9375772ff966e-Reviews.html

Neural Information Processing Systems

"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1139" "Title:","Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a method for action detection (localization and classification of actions) using weakly supervised information (action labels + eye gaze information, no explicit definition of bounding boxes). Overall, the spatio-temporal search (a huge spatio-temporal space) is done using dynamic programming and a max-path algorithm. Gaze information is introduced into framework through a loss which acounts for gaze density at a given location. QUALITY: The paper seems technically sound and makes for a nice study given gaze information.


EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Pan, Yi, Jiang, Hanqi, Chen, Junhao, Li, Yiwei, Zhao, Huaqin, Zhou, Yifan, Shu, Peng, Wu, Zihao, Liu, Zhengliang, Zhu, Dajiang, Li, Xiang, Abate, Yohannes, Liu, Tianming

arXiv.org Artificial Intelligence

Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, neuromorphic computing for the medical imaging domain remains underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture tailored for clinical tasks that incorporates eye-gaze data to guide the model's attention to the diagnostically relevant regions in medical images. Our developed approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image prediction tasks but also enhances clinical relevance through multi-modal information alignment. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for applying neuromorphic computing in healthcare.


Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Ma, Chong, Jiang, Hanqi, Chen, Wenting, Li, Yiwei, Wu, Zihao, Yu, Xiaowei, Liu, Zhengliang, Guo, Lei, Zhu, Dajiang, Zhang, Tuo, Shen, Dinggang, Liu, Tianming, Li, Xiang

arXiv.org Artificial Intelligence

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.