A Survey on Data Augmentation for Text Classification
Bayer, Markus, Kaufhold, Marc-André, Reuter, Christian
–arXiv.org Artificial Intelligence
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
arXiv.org Artificial Intelligence
Jul-14-2021
- Country:
- North America > United States
- Europe
- United Kingdom (0.14)
- Germany > Hesse
- Darmstadt Region > Darmstadt (0.04)
- Africa > South Africa
- Genre:
- Overview (1.00)
- Summary/Review (0.92)
- Research Report
- New Finding (0.46)
- Promising Solution (0.45)
- Industry:
- Information Technology > Security & Privacy (0.48)
- Technology: