A Survey on Data Augmentation for Text Classification

Bayer, Markus, Kaufhold, Marc-André, Reuter, Christian

Jul-14-2021–arXiv.org Artificial Intelligence

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).

augmentation, augmentation method, data augmentation, (14 more...)

arXiv.org Artificial Intelligence

Jul-14-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas (0.14)
  - New York (0.04)
- Europe
  - United Kingdom (0.14)
  - Germany > Hesse
    - Darmstadt Region > Darmstadt (0.04)
- Africa > South Africa
  - Gauteng > Soweto (0.04)

Genre:
- Overview (1.00)
- Summary/Review (0.92)
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.45)

Industry:
- Information Technology > Security & Privacy (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Text Classification (0.82)
    - Machine Translation (0.68)
    - Large Language Model (0.68)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)