AITopics | visual processing

Modulating early visual processing by language

Neural Information Processing SystemsMar-17-2026, 15:08:54 GMT

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic inputs are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by a linguistic input. Specifically, we introduce Conditional Batch Normalization (CBN) as an efficient mechanism to modulate convolutional feature maps by a linguistic embedding. We apply CBN to a pre-trained Residual Network (ResNet), leading to the MODulatEd ResNet (\MRN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

748d6b6ed8e13f857ceaa6cfbdca14b8-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 15:06:25 GMT

invariant, revision, trial-to-trial variability and non-stationarity, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.31)
Information Technology > Artificial Intelligence > Cognitive Science (0.31)

Add feedback

Modulating early visual processing by language

Neural Information Processing SystemsNov-21-2025, 15:36:57 GMT

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic inputs are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by a linguistic input. Specifically, we introduce Conditional Batch Normalization (CBN) as an efficient mechanism to modulate convolutional feature maps by a linguistic embedding. We apply CBN to a pre-trained Residual Network (ResNet), leading to the MODulatEd ResNet (\MRN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.

modulating early visual processing, name change, visual processing, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Modulating early visual processing by language

Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville

Neural Information Processing SystemsNov-21-2025, 10:51:51 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

Reviewer

Neural Information Processing SystemsOct-3-2025, 00:32:01 GMT

There is some knowledge from anatomy (Harris, et al, 2019).

artificial intelligence, machine learning, revision, (16 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.31)
Information Technology > Artificial Intelligence > Cognitive Science (0.31)

Add feedback

Modulating early visual processing by language

Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville

Neural Information Processing SystemsOct-3-2024, 15:16:37 GMT

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic inputs are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the entire visual processing by a linguistic input. Specifically, we introduce Conditional Batch Normalization (CBN) as an efficient mechanism to modulate convolutional feature maps by a linguistic embedding. We apply CBN to a pre-trained Residual Network (ResNet), leading to the MODulatEd ResNet (MODERN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.

architecture, modern, representation, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System

Gorji, Moslem, Ranjbar, Amin, Menhaj, Mohammad Bagher

arXiv.org Artificial IntelligenceSep-20-2024

The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.

lgn, pae model, pathway, (15 more...)

arXiv.org Artificial Intelligence

2409.13622

Country:

North America > United States (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

Chen, Chi-Sheng, Wei, Chun-Shu

arXiv.org Artificial IntelligenceJun-5-2024

Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG.

contrastive learning, encoder, learning, (13 more...)

arXiv.org Artificial Intelligence

2406.1691

Country:

Asia > Taiwan (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Analyzing the Roles of Language and Vision in Learning from Limited Data

Chen, Allison, Sucholutsky, Ilia, Russakovsky, Olga, Griffiths, Thomas L.

arXiv.org Artificial IntelligenceMay-10-2024

Does language help make sense of the visual world? How important is it to actually see the world rather than having it described with words? These basic questions about the nature of intelligence have been difficult to answer because we only had one example of an intelligent system -- humans -- and limited access to cases that isolated language or vision. However, the development of sophisticated Vision-Language Models (VLMs) by artificial intelligence researchers offers us new opportunities to explore the contributions that language and vision make to learning about the world. We ablate components from the cognitive architecture of these models to identify their contributions to learning new tasks from limited data. We find that a language model leveraging all components recovers a majority of a VLM's performance, despite its lack of visual input, and that language seems to allow this by providing access to prior knowledge and reasoning.

architecture, knowledge, visual processing, (14 more...)

arXiv.org Artificial Intelligence

2403.19669

Country:

North America > United States (0.14)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

A lattice filter model of the visual pathway

Neural Information Processing SystemsMar-14-2024, 05:58:04 GMT

Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina lateral geniculate nucelus (LGN) primary visual cortex, V1) we propose to model its function using lattice filters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice filter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive fields of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice filter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals.

lattice filter, prediction, receptive field, (13 more...)

Neural Information Processing Systems

Country: