AITopics | Pattern Recognition

d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf

Neural Information Processing SystemsMar-20-2025, 19:30:45 GMT

artificial intelligence, machine learning, registration, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Government (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

bf15e9bbff22c7719020f9df4badc20a-Paper.pdf

Neural Information Processing SystemsMar-20-2025, 11:14:29 GMT

Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. Despite new advances in representation learning and learning to learn, BPs remain a daunting challenge for modern AI.

benchmark, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)

Add feedback

8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf

Neural Information Processing SystemsMar-20-2025, 06:14:46 GMT

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science (0.94)
(4 more...)

Add feedback

8171ac2c5544a5cb54ac0f38bf477af4-Paper.pdf

Neural Information Processing SystemsMar-20-2025, 01:38:52 GMT

large language model, machine learning, pattern recognition, (21 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance

Liu, Hui, Wang, Wenya, Chen, Kecheng, Liu, Jie, Liu, Yibing, Qin, Tiexin, He, Peisong, Jiang, Xinghao, Li, Haoliang

arXiv.org Artificial IntelligenceMar-20-2025

In zero-shot image recognition tasks, humans demonstrate remarkable flexibility in classifying unseen categories by composing known simpler concepts. However, existing vision-language models (VLMs), despite achieving significant progress through large-scale natural language supervision, often underperform in real-world applications because of sub-optimal prompt engineering and the inability to adapt effectively to target classes. To address these issues, we propose a Concept-guided Human-like Bayesian Reasoning (CHBR) framework. Grounded in Bayes' theorem, CHBR models the concept used in human image recognition as latent variables and formulates this task by summing across potential concepts, weighted by a prior distribution and a likelihood function. To tackle the intractable computation over an infinite concept space, we introduce an importance sampling algorithm that iteratively prompts large language models (LLMs) to generate discriminative concepts, emphasizing inter-class differences. We further propose three heuristic approaches involving Average Likelihood, Confidence Likelihood, and Test Time Augmentation (TTA) Likelihood, which dynamically refine the combination of concepts based on the test image. Extensive evaluations across fifteen datasets demonstrate that CHBR consistently outperforms existing state-of-the-art zero-shot generalization methods.

large language model, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2503.15886

Country:

Asia > China (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.82)

Add feedback

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation

Maracani, Andrea, Ozkan, Savas, Cho, Sijun, Kim, Hyowon, Noh, Eunchung, Min, Jeongwon, Min, Cho Jung, Park, Dookun, Ozay, Mete

arXiv.org Artificial IntelligenceMar-20-2025

Scaling architectures have been proven effective for improving Scene Text Recognition (STR), but the individual contribution of vision encoder and text decoder scaling remain under-explored. In this work, we present an in-depth empirical analysis and demonstrate that, contrary to previous observations, scaling the decoder yields significant performance gains, always exceeding those achieved by encoder scaling alone. We also identify label noise as a key challenge in STR, particularly in real-world data, which can limit the effectiveness of STR models. To address this, we propose Cloze Self-Distillation (CSD), a method that mitigates label noise by distilling a student model from context-aware soft predictions and pseudolabels generated by a teacher model. Additionally, we enhance the decoder architecture by introducing differential cross-attention for STR. Our methodology achieves state-of-the-art performance on 10 out of 11 benchmarks using only real data, while significantly reducing the parameter size and computational costs.

large language model, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2503.16184

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.62)

Add feedback

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition 1

Neural Information Processing SystemsMar-19-2025, 13:50:12 GMT

Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens would lead to higher prediction accuracy, while it also results in drastically increased computational cost. To achieve a decent trade-off between accuracy and speed, the number of tokens is empirically set to 16x16 or 14x14. In this paper, we argue that every image has its own characteristics, and ideally the token number should be conditioned on each individual input.

machine learning, natural language, pattern recognition, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Large-Scale Methods for Distributionally Robust Optimization Daniel Levy

Neural Information Processing SystemsMar-19-2025, 06:25:33 GMT

We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications.

artificial intelligence, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Industry:

Education (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
(2 more...)

Add feedback

Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification

Neural Information Processing SystemsMar-18-2025, 18:38:19 GMT

We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical character recognition, featuring various image scales (microscopic, human scales, remote sensing). All datasets are preprocessed, annotated, and formatted uniformly, and come in 3 versions (Micro Mini Extended) to match users' computational resources.

artificial intelligence, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia (0.93)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.62)
(3 more...)

Add feedback

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Neural Information Processing SystemsMar-18-2025, 07:42:33 GMT

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans, illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.

arxiv preprint arxiv, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report (0.34)

Industry:

Information Technology > Services (1.00)
Law Enforcement & Public Safety > Terrorism (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Filters

Collaborating Authors

Pattern Recognition

d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf

bf15e9bbff22c7719020f9df4badc20a-Paper.pdf

8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf

8171ac2c5544a5cb54ac0f38bf477af4-Paper.pdf

Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition 1

Large-Scale Methods for Distributionally Robust Optimization Daniel Levy

Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes