Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification
Cegin, Jan, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Bielikova, Maria, Brusilovsky, Peter
–arXiv.org Artificial Intelligence
The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.
arXiv.org Artificial Intelligence
Oct-14-2024
- Country:
- Asia
- China > Guangxi Province
- Nanning (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.05)
- Thailand > Bangkok
- Bangkok (0.04)
- China > Guangxi Province
- Europe
- Czechia > South Moravian Region
- Brno (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Slovakia > Bratislava
- Bratislava (0.04)
- Czechia > South Moravian Region
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- United States > New York
- New York County > New York City (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report
- Experimental Study (0.67)
- New Finding (0.93)
- Research Report
- Industry:
- Education (0.34)
- Technology: