Fill-Up: Balancing Long-Tailed Data with Generative Models

Shin, Joonghyuk, Kang, Minguk, Park, Jaesik

Jun-12-2023–arXiv.org Artificial Intelligence

Modern text-to-image synthesis models have achieved an exceptional level of photorealism, generating high-quality images from arbitrary text descriptions. In light of the impressive synthesis ability, several studies have exhibited promising results in exploiting generated data for image recognition. However, directly supplementing data-hungry situations in the real-world (e.g. few-shot or long-tailed scenarios) with existing approaches result in marginal performance gains, as they suffer to thoroughly reflect the distribution of the real data. Through extensive experiments, this paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion. The study demonstrates that generated images from textual-inverted text tokens effectively aligns with the real domain, significantly enhancing the recognition ability of a standard ResNet50 backbone. We also show that real-world data imbalance scenarios can be successfully mitigated by filling up the imbalanced data with synthetic images. In conjunction with techniques in the area of long-tailed recognition, our method achieves state-of-the-art results on standard long-tailed benchmarks when trained from scratch.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

Jun-12-2023

arXiv.org PDF

Add feedback

Country:
- Africa (0.05)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - South Korea > Gyeongsangbuk-do
    - Pohang (0.04)

Genre:
- Research Report
  - Promising Solution (0.93)
  - New Finding (0.92)

Industry:
- Information Technology > Security & Privacy (0.67)
- Media > Photography (0.46)
- Leisure & Entertainment > Sports (0.46)
- Transportation
  - Infrastructure & Services (0.46)
  - Ground (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found