Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Jun-4-2025–arXiv.org Artificial Intelligence

Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and visual search can efficiently share a common representation, allowing a model trained in free-viewing attention to transfer its knowledge to task-driven visual search with a performance drop of only 3.86% in the predicted fixation scanpaths, measured by the semantic sequence score (SemSS) metric which reflects the similarity between predicted and human scanpaths. This transfer reduces computational costs by 92.29% in terms of GFLOPs and 31.23% in terms of trainable parameters.

artificial intelligence, information management, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-4-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report > New Finding (0.54)

Industry:
- Health & Medicine > Therapeutic Area (0.47)

Technology:
- Information Technology
  - Information Management > Search (0.89)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found