DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention

Chen, Haohan, Liu, Hongjia, Lan, Shiyong, Wang, Wenwu, Qiao, Yixin, Li, Yao, Deng, Guonan

May-27-2025–arXiv.org Artificial Intelligence

Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose estimation features, to improve overall performance. Furthermore, we introduce a new cascaded attention module named Multi-Scale Global Local Attention Module (MS-GLAM). Through a customized cascaded attention structure, it e ffectively focuses on global and local information at multiple scales, further enhancing the information from the Disentangler. Finally, the global gaze-relevant features disentangled by the upper face branch, combined with head pose and local eye features, are passed through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been extensively validated on two mainstream public datasets, achieving state-of-the-art performance. Keywords: gaze estimation, feature disentanglement, Gaussian similarity, multi-scale attention1. Introduction Gaze estimation, the task of predicting gaze direction, crucial for measuring human attention, is widely applied in areas like saliency detection[1, 2], virtual reality[3], driver distraction monitoring[4], human-computer interaction[5] and autism diagnosis[6]. Recently, gaze estimation has shifted from model-based methods to appearance-based methods.

artificial intelligence, estimation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- Europe
  - United Kingdom (0.28)
  - Finland (0.28)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine (0.54)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Human Computer Interaction (1.00)
  - Artificial Intelligence
    - Vision > Face Recognition (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found