fixation
- North America > United States > California > San Francisco County > San Francisco (0.29)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.15)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.28)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.94)
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains
The human visual system uses two parallel pathways for spatial processing and object recognition. In contrast, computer vision systems tend to use a single feedforward pathway, rendering them less robust, adaptive, or efficient than human vision. To bridge this gap, we developed a dual-stream vision model inspired by the human eyes and brain. At the input level, the model samples two complementary visual patterns to mimic how the human eyes use magnocellular and parvocellular retinal ganglion cells to separate retinal inputs to the brain. At the backend, the model processes the separate input patterns through two branches of convolutional neural networks (CNN) to mimic how the human brain uses the dorsal and ventral cortical pathways for parallel visual processing.
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
Lee, Daeun, Mukherjee, Subhojyoti, Kveton, Branislav, Rossi, Ryan A., Lai, Viet Dac, Yoon, Seunghyun, Bui, Trung, Dernoncourt, Franck, Bansal, Mohit
Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications like AR glasses. While prior streaming benchmarks evaluate temporal reasoning, none measure whether MLLMs can interpret or leverage human gaze signals within a streaming setting. To fill this gap, we introduce StreamGaze, the first benchmark designed to evaluate how effectively MLLMs use gaze for temporal and proactive reasoning in streaming videos. StreamGaze introduces gaze-guided past, present, and proactive tasks that comprehensively evaluate streaming video understanding. These tasks assess whether models can use real-time gaze to follow shifting attention and infer user intentions from only past and currently observed frames. To build StreamGaze, we develop a gaze-video QA generation pipeline that aligns egocentric videos with raw gaze trajectories via fixation extraction, region-specific visual prompting, and scanpath construction. This pipeline produces spatio-temporally grounded QA pairs that closely reflect human perceptual dynamics. Across all StreamGaze tasks, we observe substantial performance gaps between state-of-the-art MLLMs and human performance, revealing fundamental limitations in gaze-based temporal reasoning, intention modeling, and proactive prediction. We further provide detailed analyses of gaze-prompting strategies, reasoning behaviors, and task-specific failure modes, offering deeper insight into why current MLLMs struggle and what capabilities future models must develop. All data and code will be publicly released to support continued research in gaze-guided streaming video understanding.
Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design
Solbach, Markus D., Tsotsos, John K.
Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains. We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and imitation learning, revealed challenges in directly learning optimal strategies, the successful implementation of curriculum learning offers a promising avenue. Effective learning was achieved by strategically designing the lesson plan based on the findings of a real-world human experiment.
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Education (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Fusing Foveal Fixations Using Linear Retinal Transformations and Bayesian Experimental Design
Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this paper we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. Further, this allows us to formulate and solve the choice of "where to look next" as a Bayesian experimental design problem using the Expected Information Gain criterion. Experiments on the Frey faces and MNIST datasets demonstrate the effectiveness of our models.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)