AITopics | scanpath

Collaborating Authors

scanpath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

Neural Information Processing SystemsJun-22-2026, 23:53:13 GMT

Numerous models have been developed for scanpath and saliency prediction, which are typically trained on scanpaths, which model eye movement as a sequence of discrete fixation points connected by saccades, while the rich information contained in the raw trajectories is often discarded. Moreover, most existing approaches fail to capture the variability observed among human subjects viewing the same image. They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. Our method builds on a diffusion model conditioned on visual stimuli and introduces a novel component, namely Corresponding Positional Embedding (CPE), which aligns spatial gaze information with the patch-based semantic features of the visual input. By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEyecaptures the inherent variability in human gaze behavior and generates high-quality, realistic eye movement patterns, despite being trained on a comparatively small dataset. The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention. DiffEyeis the first method to tackle this task on natural images using a diffusion model while fully leveraging the richness of raw eye-tracking data. Our extensive evaluation shows that DiffEyenot only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

Neural Information Processing SystemsJun-17-2026, 06:52:46 GMT

Understanding how humans move their eyes to gather visual information is a central question in neuroscience, cognitive science, and vision research. While recent deep learning (DL) models achieve state-of-the-art performance in predicting human scanpaths, their underlying decision processes remain opaque. At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model--DeepGaze III--to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. By identifying individual fixations where DeepGaze III succeeds and SceneWalk fails, we isolate behaviorally meaningful discrepancies and use them to motivate targeted extensions of the mechanistic framework. These include time-dependent temperature scaling, saccadic momentum and an adaptive cardinal attention bias: Simple, interpretable additions that substantially boost predictive performance. With these extensions, SceneWalk's explained variance on the MIT1003 dataset doubles from 35% to 70%, setting a new state of the art in mechanistic scanpath prediction. Our findings show how performance-optimized neural networks can serve as tools for cognitive model discovery, offering a new path toward interpretable and high-performing models of visual behavior.

artificial intelligence, bit fix, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

Neural Information Processing SystemsJun-14-2026, 06:12:07 GMT

Numerous models have been developed for scanpath and saliency prediction, which are typically trained on scanpaths, which model eye movement as a sequence of discrete fixation points connected by saccades, while the rich information contained in the raw trajectories is often discarded. Moreover, most existing approaches fail to capture the variability observed among human subjects viewing the same image. They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. Our method builds on a diffusion model conditioned on visual stimuli and introduces a novel component, namely Corresponding Positional Embedding (CPE), which aligns spatial gaze information with the patch-based semantic features of the visual input. By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEye captures the inherent variability in human gaze behavior and generates high-quality, realistic eye movement patterns, despite being trained on a comparatively small dataset. The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention. DiffEye is the first method to tackle this task on natural images using a diffusion model while fully leveraging the richness of raw eye-tracking data. Our extensive evaluation shows that DiffEye not only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.86)
Information Technology > Artificial Intelligence > Vision (0.84)
Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

From Human Attention to Diagnosis: Semantic Patch-Level Integration of Vision-Language Models in Medical Imaging

Neural Information Processing SystemsJun-14-2026, 02:37:31 GMT

Predicting human eye movements during goal-directed visual search is critical for enhancing interactive AI systems. In medical imaging, such prediction can support radiologists in interpreting complex data, such as chest X-rays. Many existing methods rely on generic vision--language models and saliency-based features, which can limit their ability to capture fine-grained clinical semantics and integrate domain knowledge effectively. We present \textbf{LogitGaze-Med}, a state-of-the-art multimodal transformer framework that unifies (1) domain-specific visual encoders (e.g., CheXNet), (2) textual embeddings of diagnostic labels, and (3) semantic priors extracted via the logit-lens from an instruction-tuned medical vision--language model (LLaVA-Med). By directly predicting continuous fixation coordinates and dwell durations, our model generates clinically meaningful scanpaths. Experiments on the GazeSearch dataset and synthetic scanpaths generated from MIMIC-CXR and validated by experts demonstrate that LogitGaze-Med improves scanpath similarity metrics by 20--30\% over competitive baselines and yields over 5\% gains in downstream pathology classification when incorporating predicted fixations as additional training data.

artificial intelligence, name change, proceedings, (3 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.64)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Neural Information Processing SystemsApr-29-2026, 19:21:09 GMT

Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo viewport sequences from a given starting point.

artificial intelligence, human computer interaction, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

bff09ce4b210b185a265c9bcd58048bb-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 22:20:26 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Add feedback

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment Tianhe Wu

Neural Information Processing SystemsFeb-17-2026, 03:59:10 GMT

Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets.

artificial intelligence, human computer interaction, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > Netherlands > Drenthe > Assen (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.68)

Add feedback

4ea14e6090343523ddcd5d3ca449695f-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-8-2026, 20:44:26 GMT

Thus, there is a need for a reference point, on which each model canbetested andfrom where potential improvements canbe derived. In this study, we select publicly available state-of-the-art visual search models and datasets in natural scenes, and provide a common framework for their evaluation. To this end, we apply a unified format and criteria, bridging the gaps between them, and we estimate the models' efficiency and similarity with humans using a specific set of metrics.

artificial intelligence, machine learning, participant, (19 more...)

Neural Information Processing Systems

Country:

South America > Argentina (0.06)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.46)

Add feedback

691dcb1d65f31967a874d18383b9da75-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 18:16:53 GMT

background, detectability, saccade, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

scanpath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

From Human Attention to Diagnosis: Semantic Patch-Level Integration of Vision-Language Models in Medical Imaging

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

bff09ce4b210b185a265c9bcd58048bb-Paper-Conference.pdf

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment Tianhe Wu

4ea14e6090343523ddcd5d3ca449695f-Supplemental-Datasets_and_Benchmarks.pdf

4ea14e6090343523ddcd5d3ca449695f-Paper-Datasets_and_Benchmarks.pdf

691dcb1d65f31967a874d18383b9da75-AuthorFeedback.pdf