AITopics | fgvc

Maximum-Entropy Fine Grained Classification

Neural Information Processing SystemsMar-16-2026, 17:58:57 GMT

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Maximum-Entropy Fine Grained Classification

Neural Information Processing SystemsNov-20-2025, 21:46:50 GMT

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.

electronic proceedings, maximum-entropy fine grained classification, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

The Loupe: A Plug-and-Play Attention Module for Amplifying Discriminative Features in Vision Transformers

Sengodan, Naren

arXiv.org Artificial IntelligenceAug-26-2025

Fine-Grained Visual Classification (FGVC) is a critical and challenging area within computer vision, demanding the identification of highly subtle, localized visual cues. The importance of FGVC extends to critical applications such as biodiversity monitoring and medical diagnostics, where precision is paramount. While large-scale Vision Transformers have achieved state-of-the-art performance, their decision-making processes often lack the interpretability required for trust and verification in such domains. In this paper, we introduce The Loupe, a novel, lightweight, and plug-and-play attention module designed to be inserted into pre-trained backbones like the Swin Transformer. The Loupe is trained end-to-end with a composite loss function that implicitly guides the model to focus on the most discriminative object parts without requiring explicit part-level annotations. Our unique contribution lies in demonstrating that a simple, intrinsic attention mechanism can act as a powerful regularizer, significantly boosting performance while simultaneously providing clear visual explanations. Our experimental evaluation on the challenging CUB-200-2011 dataset shows that The Loupe improves the accuracy of a Swin-Base model from 85.40% to 88.06%, a significant gain of 2.66%. Crucially, our qualitative analysis of the learned attention maps reveals that The Loupe effectively localizes semantically meaningful features, providing a valuable tool for understanding and trusting the model's decision-making process.

artificial intelligence, machine learning, transformer, (20 more...)

arXiv.org Artificial Intelligence

2508.16663

Country: North America > United States > California (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

Neural Information Processing SystemsAug-18-2025, 22:01:01 GMT

We also experimentally validate our theory on fine-grained dis-tinguishability and obtain consistent results across multiple benchmarks.

artificial intelligence, information, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Enhancing Fine-Grained 3D Object Recognition using Hybrid Multi-Modal Vision Transformer-CNN Models

Xiong, Songsong, Tziafas, Georgios, Kasaei, Hamidreza

arXiv.org Artificial IntelligenceMar-6-2023

Robots operating in human-centered environments, such as retail stores, restaurants, and households, are often required to distinguish between similar objects in different contexts with a high degree of accuracy. However, fine-grained object recognition remains a challenge in robotics due to the high intra-category and low inter-category dissimilarities. In addition, the limited number of fine-grained 3D datasets poses a significant problem in addressing this issue effectively. In this paper, we propose a hybrid multi-modal Vision Transformer (ViT) and Convolutional Neural Networks (CNN) approach to improve the performance of fine-grained visual classification (FGVC). To address the shortage of FGVC 3D datasets, we generated two synthetic datasets. The first dataset consists of 20 categories related to restaurants with a total of 100 instances, while the second dataset contains 120 shoe instances. Our approach was evaluated on both datasets, and the results indicate that it outperforms both CNN-only and ViT-only baselines, achieving a recognition accuracy of 94.50 % and 93.51 % on the restaurant and shoe datasets, respectively. Additionally, we have made our FGVC RGB-D datasets available to the research community to enable further experimentation and advancement. Furthermore, we successfully integrated our proposed method with a robot framework and demonstrated its potential as a fine-grained perception tool in both simulated and real-world robotic scenarios.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.04613

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Arizona (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Consumer Products & Services > Restaurants (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

Chaudhuri, Abhra, Mancini, Massimiliano, Akata, Zeynep, Dutta, Anjan

arXiv.org Artificial IntelligenceOct-5-2022

Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies, a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Starting with a rigorous formalization of the notion of distinguishability between fine-grained categories, we prove the necessary and sufficient conditions that a model must satisfy in order to learn the underlying decision boundaries in the fine-grained setting. We design Relational Proxies based on our theoretical findings and evaluate it on seven challenging fine-grained benchmark datasets and achieve state-of-the-art results on all of them, surpassing the performance of all existing works with a margin exceeding 4% in some cases. We also experimentally validate our theory on fine-grained distinguishability and obtain consistent results across multiple benchmarks.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2210.02149

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

AI Researchers From China Designed An Image Classification Algorithm, FGVC, Based On …

#artificialintelligenceJul-2-2022, 06:50:26 GMT

Mahmoud is a PhD researcher in machine learning. He also holds a bachelor's degree in physical science and a master's degree in telecommunications …

ai researcher, china designed, image classification algorithm, (1 more...)

#artificialintelligence

Country: Asia > China (0.40)

Industry:

Media > News (0.67)
Telecommunications (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.40)

Add feedback

Maximum-Entropy Fine Grained Classification

Dubey, Abhimanyu, Gupta, Otkrist, Raskar, Ramesh, Naik, Nikhil

Neural Information Processing SystemsFeb-14-2020, 06:11:23 GMT

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems. Papers published at the Neural Information Processing Systems Conference.

diversity, fgvc, maximum-entropy fine grained classification

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Fine-Grained Visual Categorization via Multi-stage Metric Learning

Qian, Qi, Jin, Rong, Zhu, Shenghuo, Lin, Yuanqing

arXiv.org Machine LearningJun-4-2015

Fine-grained visual categorization (FGVC) is to categorize objects into subordinate classes instead of basic classes. One major challenge in FGVC is the co-occurrence of two issues: 1) many subordinate classes are highly correlated and are difficult to distinguish, and 2) there exists the large intra-class variation (e.g., due to object pose). This paper proposes to explicitly address the above two issues via distance metric learning (DML). DML addresses the first issue by learning an embedding so that data points from the same class will be pulled together while those from different classes should be pushed apart from each other; and it addresses the second issue by allowing the flexibility that only a portion of the neighbors (not all data points) from the same class need to be pulled together. However, feature representation of an image is often high dimensional, and DML is known to have difficulty in dealing with high dimensional feature vectors since it would require $\mathcal{O}(d^2)$ for storage and $\mathcal{O}(d^3)$ for optimization. To this end, we proposed a multi-stage metric learning framework that divides the large-scale high dimensional learning problem to a series of simple subproblems, achieving $\mathcal{O}(d)$ computational complexity. The empirical study with FVGC benchmark datasets verifies that our method is both effective and efficient compared to the state-of-the-art FGVC approaches.

artificial intelligence, constraint, machine learning, (16 more...)

arXiv.org Machine Learning

1402.0453

Country: North America > United States > Michigan (0.28)

Genre: Research Report (1.00)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback