Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning

Ye, Han-Jia, Chen, Hong-You, Zhan, De-Chuan, Chao, Wei-Lun

Jan-5-2020–arXiv.org Machine Learning

In practice, however, we frequently encounter training data with a class-imbalanced distribution . For example, modern real-world large-scale datasets often have the so-called long-tailed distribution: a few major classes claim most of the instances, while most of the other minor classes are represented by relatively fewer instances [16, 31, 38, 50, 51, 61]. Classifiers trained with this kind of datasets using conventional strategies (e.g., mini-batch SGD on uniformly sampled instances) have been found to perform poorly on minor classes [3, 19, 40, 52], which is particularly unfavorable if we evaluate the classifiers with class-balanced test data or average per-class accuracy. One common explanation to the poor performance is the Figure 1: Over-fitting to minor classes and feature deviation: (top-left) the number of training (red) and test (blue) instances per class of an imbalanced CIFAR-10 [8, 32]; (top-right) the training and test set accuracy per class using a ResNet [20]; (bottom) the t-SNE [41] plot of the training (circle) and test (cross) features before the last linear classifier layer. We see a trend of over-fitting to minor classes, which results from the feature deviation of training and test instances (see the magenta and red minor classes).

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Machine Learning

Jan-5-2020

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Jiangsu Province > Nanjing (0.04)
- Europe > Poland (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > Ohio (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.83)
  - Statistical Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found