A Comprehensive Survey on Data Augmentation

Wang, Zaitian, Wang, Pengfei, Liu, Kunpeng, Wang, Pengyang, Fu, Yanjie, Lu, Chang-Tien, Aggarwal, Charu C., Pei, Jian, Zhou, Yuanchun

May-17-2024–arXiv.org Artificial Intelligence

Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach.

augmentation, data augmentation, proceedings, (14 more...)

arXiv.org Artificial Intelligence

May-17-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Western Australia > Perth (0.04)
- North America > United States
  - Virginia > Falls Church (0.04)
  - Oregon > Multnomah County
    - Portland (0.04)
  - North Carolina > Durham County
    - Durham (0.04)
  - Arizona > Maricopa County
    - Tempe (0.04)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)
- Asia
  - Macao (0.04)
  - Japan > Honshū
    - Chūbu > Nagano Prefecture > Nagano (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Overview (1.00)

Industry:
- Information Technology (1.00)
- Health & Medicine (0.92)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language
      - Large Language Model (0.68)
      - Chatbot (0.67)
      - Text Processing (0.67)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found