STAR: Improving Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models
Ma, Mingyu Derek, Wang, Xiaoxuan, Kung, Po-Nien, Brantingham, P. Jeffrey, Peng, Nanyun, Wang, Wei
–arXiv.org Artificial Intelligence
Information extraction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies. They heavily rely on task-specific training data in the form of (passage, target structure) pairs to obtain reasonable performance. However, obtaining such data through human annotation is costly, leading to a pressing need for low-resource information extraction approaches that require minimal human labeling for real-world applications. Fine-tuning supervised models with synthesized training data would be a generalizable method, but the existing data generation methods either still rely on large-scale ground-truth data or cannot be applied to complicated IE tasks due to their poor performance. To address these challenges, we propose STAR, a data generation method that leverages Large Language Models (LLMs) to synthesize data instances given limited seed demonstrations, thereby boosting low-resource information extraction performance. Our approach involves generating target structures (Y) followed by generating passages (X), all accomplished with the aid of LLMs. We design fine-grained step-by-step instructions to obtain the initial data instances. We further reduce errors and improve data quality through self-reflection error identification and self-refinement with iterative revision. Our experiments show that the data generated by STAR significantly improves the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data. Human assessment of the data quality shows STAR-generated data exhibits higher passage quality and better align with the task definitions compared with the human-curated data.
arXiv.org Artificial Intelligence
Sep-30-2023
- Country:
- Oceania > Australia
- North America
- United States
- Washington > King County
- Seattle (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Los Angeles (0.14)
- Washington > King County
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Portugal > Lisbon
- Lisbon (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Portugal > Lisbon
- Asia
- Malaysia (0.04)
- China (0.04)
- Middle East
- Syria (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Genre:
- Research Report (0.40)
- Instructional Material (0.34)
- Technology: