AITopics | Sensing and Signal Processing

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning Yibing Song

Neural Information Processing SystemsMay-28-2025, 12:38:57 GMT

Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor.

artificial intelligence, machine learning, natural language, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

CRAYM: Neural Field Optimization via Camera RAY Matching 1

Neural Information Processing SystemsMay-28-2025, 12:37:08 GMT

We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We formulate our per-ray optimization and matched ray coherence by focusing on camera rays passing through keypoints in the input images to elevate both the efficiency and accuracy of scene correspondences. Accumulated ray features along the feature volume provide a means to discount the coherence constraint amid erroneous ray matching. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense-or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia (0.67)
North America > United States > Texas (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

20479c788fb27378c2c99eadcf207e7f-Paper.pdf

Neural Information Processing SystemsMay-28-2025, 12:14:14 GMT

artificial intelligence, computer vision, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

157c30da6a988e1cbef2095f7b9521db-Paper-Conference.pdf

Neural Information Processing SystemsMay-28-2025, 12:06:31 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

On the Out-of-distribution Generalization of Probabilistic Image Modelling Steven McDonagh 2

Neural Information Processing SystemsMay-28-2025, 12:03:10 GMT

Out-of-distribution (OOD) detection and lossless compression constitute two problems that can be solved by the training of probabilistic models on a first dataset with subsequent likelihood evaluation on a second dataset, where data distributions differ. By defining the generalization of probabilistic models in terms of likelihood we show that, in the case of image models, the OOD generalization ability is dominated by local features.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Neural Information Processing SystemsMay-28-2025, 11:43:12 GMT

Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions. Additionally, about 98k pairs of them are annotated with detailed reasoning steps. Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. We also introduce the related benchmark to evaluate the MLLMs in scenarios requiring specific local region identification. Extensive experiments demonstrate the effectiveness of our framework and shed light on better inference strategies. The Visual CoT dataset, benchmark, and pre-trained models are available on this webpage to support further research in this area.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.93)
Overview (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

1d49780520898fe37f0cd6b41c5311bf-Paper.pdf

Neural Information Processing SystemsMay-28-2025, 11:42:08 GMT

artificial intelligence, corruption, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

1cdf14d1e3699d61d237cf76ce1c2dca-Supplemental.pdf

Neural Information Processing SystemsMay-28-2025, 11:37:25 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.87)

Add feedback

1cdf14d1e3699d61d237cf76ce1c2dca-Paper.pdf

Neural Information Processing SystemsMay-28-2025, 11:37:19 GMT

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Truly Scale-Equivariant Deep Nets with Fourier Layers

Neural Information Processing SystemsMay-28-2025, 11:33:02 GMT

In computer vision, models must be able to adapt to changes in image resolution to effectively carry out tasks such as image segmentation; This is known as scaleequivariance. Recent works have made progress in developing scale-equivariant convolutional neural networks, e.g., through weight-sharing and kernel resizing. However, these networks are not truly scale-equivariant in practice. Specifically, they do not consider anti-aliasing as they formulate the down-scaling operation in the continuous domain. To address this shortcoming, we directly formulate downscaling in the discrete domain with consideration of anti-aliasing. We then propose a novel architecture based on Fourier layers to achieve truly scale-equivariant deep nets, i.e., absolute zero equivariance-error. Following prior works, we test this model on MNIST-scale and STL-10 datasets. Our proposed model achieves competitive classification performance while maintaining zero equivariance-error.

artificial intelligence, machine learning, proc, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Filters

Collaborating Authors

Sensing and Signal Processing

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning Yibing Song

CRAYM: Neural Field Optimization via Camera RAY Matching 1

20479c788fb27378c2c99eadcf207e7f-Paper.pdf

157c30da6a988e1cbef2095f7b9521db-Paper-Conference.pdf

On the Out-of-distribution Generalization of Probabilistic Image Modelling Steven McDonagh 2

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

1d49780520898fe37f0cd6b41c5311bf-Paper.pdf

1cdf14d1e3699d61d237cf76ce1c2dca-Supplemental.pdf

1cdf14d1e3699d61d237cf76ce1c2dca-Paper.pdf

Truly Scale-Equivariant Deep Nets with Fourier Layers