Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

Li, Zhuo, Zhao, He, Li, Zhen, Liu, Tongliang, Guo, Dandan, Wan, Xiang

Apr-10-2024–arXiv.org Artificial Intelligence

Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less noisy subset for a robust model training. Extensive experiments demonstrate that our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.

dataset, extracting clean and balanced subset, noise, (12 more...)

arXiv.org Artificial Intelligence

Apr-10-2024

arXiv.org PDF

Add feedback

Country:
- South America > Paraguay
  - Asunción > Asunción (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Portugal > Lisbon
    - Lisbon (0.04)
- Asia > China
  - Guangdong Province > Shenzhen (0.04)
  - Hong Kong (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Vision (0.94)
    - Representation & Reasoning (0.92)
    - Machine Learning
      - Neural Networks (0.47)
      - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found