Data Augmentation for Neural NLP
Pluščec, Domagoj, Šnajder, Jan
–arXiv.org Artificial Intelligence
Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data to train. Acquiring data for various machine learning problems is accompanied by high labeling costs. Data augmentation is a low-cost approach for tackling data scarcity. This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing, with an emphasis on methods for neural and transformer-based models. Furthermore, it discusses the practical challenges of data augmentation, possible mitigations, and directions for future research.
arXiv.org Artificial Intelligence
Feb-22-2023
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- Mexico (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Europe
- Asia
- Africa > South Africa
- Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.34)
- Industry:
- Government (0.68)
- Information Technology > Security & Privacy (0.46)
- Media (0.46)
- Technology: