A Two-Stage Approach to Few-Shot Learning for Image Recognition

Das, Debasmit, Lee, C. S. George

arXiv.org Machine Learning 

--This paper proposes a multi-layer neural network structure for few-shot image recognition of novel categories. The proposed multi-layer neural network architecture encodes transferable knowledge extracted from a large annotated dataset of base categories. This architecture is then applied to novel categories containing only a few samples. The transfer of knowledge is carried out at the feature-extraction and the classification levels distributed across the two training stages. In the first-training stage, we introduce the relative feature to capture the structure of the data as well as obtain a low-dimensional discriminative space. Secondly, we account for the variable variance of different categories by using a network to predict the variance of each class. Classification is then performed by computing the Maha-lanobis distance to the mean-class representation in contrast to previous approaches that used the Euclidean distance. In the second-training stage, a category-agnostic mapping is learned from the mean-sample representation to its corresponding class-prototype representation. This is because the mean-sample representation may not accurately represent the novel category prototype. Finally, we evaluate the proposed network structure on four standard few-shot image recognition datasets, where our proposed few-shot learning system produces competitive performance compared to previous work. We also extensively studied and analyzed the contribution of each component of our proposed framework. For the past decade, deep convolutional neural networks (CNN) have produced excellent results in visual recognition tasks such as object recognition, scene classification, etc. [1]- [3]. A CNN learns to recognize a large quantity of visual categories by training on a large collection of annotated images using a gradient-descent technique [4]. Although the training procedure is computationally intensive, it can be parallelized using a Graphics Processing Unit (GPU). Even after a long training period, the CNN can only recognize a fixed set of image categories. To learn to recognize novel categories, one has to collect new training data and retrain the CNN model with further adjustments. Unfortunately, in some cases, there might not be enough labeled data available for training a novel category. This work was supported in part by the National Science Foundation under Grant IIS-1813935. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge the support of NVIDIA Corporation for the donation of a TIT AN XP GPU used for this research. Object categories follow a long tailed distribution with a lot of rare classes and very few common classes. In such a long-tailed distribution, only a few object categories occur frequently.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found