A Survey of Data Augmentation Approaches for NLP
Feng, Steven Y., Gangal, Varun, Wei, Jason, Chandar, Sarath, Vosoughi, Soroush, Mitamura, Teruko, Hovy, Eduard
–arXiv.org Artificial Intelligence
Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area.
arXiv.org Artificial Intelligence
May-7-2021
- Country:
- South America > Chile
- Oceania > Australia
- North America
- United States
- Virginia (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Washington > King County
- Seattle (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Canada
- Quebec (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Germany > Berlin (0.04)
- Netherlands (0.04)
- Spain
- Lithuania > Kaunas County
- Kaunas (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Vietnam > Hanoi
- Hanoi (0.04)
- Thailand > Chiang Mai
- Chiang Mai (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- China
- Vietnam > Hanoi
- Genre:
- Research Report (0.40)
- Technology: