AITopics | image descriptor

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

Neural Information Processing SystemsJun-13-2026, 21:11:21 GMT

We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene. The key challenge in this setting is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image. In this paper, we first analyze the limitations of existing methods on this challenging task and then introduce new benchmarks to support SoIR evaluation. Next, we introduce \ours (\oursMI), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase. This is followed by a refinement process that leverages attention-based feature extraction with object masks, integrating them into a single unified image descriptor. Our \oursMI approach significantly outperforms existing retrieval methods and strong baselines, achieving notable improvements in both zero-shot and lightweight multi-object fine-tuning. We hope this work will pave the way and inspire further research to enhance retrieval performance for this highly practical task.

artificial intelligence, name change, proceedings, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.87)
Information Technology > Artificial Intelligence (0.78)

Add feedback

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Neural Information Processing SystemsMar-17-2026, 02:07:13 GMT

In this paper, we propose a novel regularization method for Generative Adversarial Networks that allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We exploit the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train the binarized penultimate layer's low-dimensional representation to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized penultimate layer's low-dimensional representation (i.e.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.65)

Add feedback

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, Tomasz Trzcinski

Neural Information Processing SystemsMar-16-2026, 14:46:35 GMT

Neural Information Processing Systems http://nips.cc/

descriptor, discriminator, representation, (16 more...)

Neural Information Processing Systems

Country:

Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, Tomasz Trzcinski

Neural Information Processing SystemsFeb-15-2026, 06:19:38 GMT

We adjust this regularization term so that it concentrates on increasing the entropy of the particular pairs of binary vectors that are not correlated in high-dimensional space.

artificial intelligence, descriptor, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching

Stepan Tulyakov, Anton Ivanov, François Fleuret

Neural Information Processing SystemsFeb-14-2026, 03:44:28 GMT

Neural Information Processing Systems http://nips.cc/

disparity, disparity range, module, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.05)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.47)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Neural Information Processing SystemsDec-24-2025, 01:06:17 GMT

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples. Existing few-shot video-language learners focus exclusively on the encoder, resulting in the absence of a video-to-text decoder to handle generative tasks. Video captioners have been pretrained on large-scale video-language datasets, but they rely heavily on finetuning and lack the ability to generate text for unseen tasks in a few-shot setting. We propose VidIL, a few-shot Video-language Learner via Image and Language models, which demonstrates strong performance on few-shot video-to-text tasks without the necessity of pretraining or finetuning on any video datasets. We use image-language models to translate the video content into frame captions, object, attribute, and event phrases, and compose them into a temporal-aware template. We then instruct a language model, with a prompt containing a few in-context examples, to generate a target output from the composed content. The flexibility of prompting allows the model to capture any form of text input, such as automatic speech recognition (ASR) transcripts. Our experiments demonstrate the power of language models in understanding videos on a wide variety of video-language tasks, including video captioning, video question answering, video caption retrieval, and video future event prediction.

image descriptor, language model, strong few-shot video-language learner, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.58)

Add feedback

Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics

Trinh, Nick, Lyons, Damian

arXiv.org Artificial IntelligenceDec-11-2025

Visual place recognition (VPR) is an important component technology for camera-based mapping and navigation applications. This is a challenging problem because images of the same place may appear quite different for reasons including seasonal changes, weather illumination, structural changes to the environment, as well as transient pedestrian or vehicle traffic. Papers focusing on generating image descriptors for VPR report their results using metrics such as recall@K and ROC curves. However, for a robot implementation, determining which matches are sufficiently good is often reduced to a manually set threshold. And it is difficult to manually select a threshold that will work for a variety of visual scenarios. This paper addresses the problem of automatically selecting a threshold for VPR by looking at the 'negative' Gaussian mixture statistics for a place - image statistics indicating not this place. We show that this approach can be used to select thresholds that work well for a variety of image databases and image descriptors.

artificial intelligence, machine learning, threshold, (14 more...)

arXiv.org Artificial Intelligence

2512.09071

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Neural Information Processing SystemsNov-20-2025, 23:13:40 GMT

In this paper, we propose a novel regularization method for Generative Adversarial Networks that allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We exploit the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train the binarized penultimate layer's low-dimensional representation to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized penultimate layer's low-dimensional representation (i.e.

learning compact binary descriptor, name change, regularized gan, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.65)

Add feedback

Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching

Stepan Tulyakov, Anton Ivanov, François Fleuret

Neural Information Processing SystemsNov-20-2025, 19:16:33 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, disparity, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.05)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.47)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Neural Information Processing SystemsOct-10-2024, 15:53:08 GMT

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples. Existing few-shot video-language learners focus exclusively on the encoder, resulting in the absence of a video-to-text decoder to handle generative tasks. Video captioners have been pretrained on large-scale video-language datasets, but they rely heavily on finetuning and lack the ability to generate text for unseen tasks in a few-shot setting. We propose VidIL, a few-shot Video-language Learner via Image and Language models, which demonstrates strong performance on few-shot video-to-text tasks without the necessity of pretraining or finetuning on any video datasets. We use image-language models to translate the video content into frame captions, object, attribute, and event phrases, and compose them into a temporal-aware template.

image descriptor, language model, strong few-shot video-language learner, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Filters

Collaborating Authors

image descriptor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners