Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

Ding, Bosheng, Qin, Chengwei, Zhao, Ruochen, Luo, Tianze, Li, Xinze, Chen, Guizhen, Xia, Wenhan, Hu, Junjie, Luu, Anh Tuan, Joty, Shafiq

Jul-2-2024–arXiv.org Artificial Intelligence

In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond. From both data and learning perspectives, we examine various strategies that utilize LLMs for data augmentation, including a novel exploration of learning paradigms where LLM-generated data is used for diverse forms of further training. Additionally, this paper highlights the primary open challenges faced in this domain, ranging from controllable data augmentation to multi-modal data augmentation. This survey highlights a paradigm shift introduced by LLMs in DA, and aims to serve as a comprehensive guide for researchers and practitioners.

data augmentation, language model, llm, (12 more...)

arXiv.org Artificial Intelligence

Jul-2-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > Wisconsin
    - Dane County > Madison (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Singapore (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)

Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.92)

Industry:
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found