CSGaze: Context-aware Social Gaze Prediction

Madan, Surbhi, Ghosh, Shreya, Subramanian, Ramanathan, Dhall, Abhinav, Gedeon, Tom

Nov-17-2025–arXiv.org Artificial Intelligence

A person's gaze offers valuable insights into their focus of attention, level of social engagement, and confidence. In this work, we investigate how contextual cues combined with visual scene and facial information can be effectively utilized to predict and interpret social gaze patterns during conversational interactions. We introduce CSGaze, a context aware multimodal approach that leverages facial, scene information as complementary inputs to enhance social gaze pattern prediction from multi-person images. The model also incorporates a fine-grained attention mechanism centered on the principal speaker, which helps in better modeling social gaze dynamics. Experimental results show that CSGaze performs competitively with state-of-the-art methods on GP-Static, UCO-LAEO and AVA-LAEO. Our findings highlight the role of contextual cues in improving social gaze prediction. Additionally, we provide initial explainability through generated attention scores, offering insights into the model's decision-making process. We also demonstrate our model's generalizability by testing our model on open set datasets that demonstrating its robustness across diverse scenarios.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-17-2025

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.06)
- North America > United States
  - District of Columbia > Washington (0.04)
  - New York > New York County
    - New York City (0.04)
- Oceania > Australia
  - Australian Capital Territory > Canberra (0.04)
  - Queensland > Brisbane (0.04)
  - Victoria > Melbourne (0.04)
  - Western Australia > Perth (0.04)

Genre:
- Research Report
  - New Finding (0.68)
  - Promising Solution (0.48)

Industry:
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Cognitive Science (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)
    - Natural Language > Large Language Model (0.47)
    - Vision (1.00)
  - Human Computer Interaction (1.00)
  - Sensing and Signal Processing > Image Processing (0.94)