Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model
Takahashi, Kosuke, Omi, Takahiro, Arima, Kosuke, Ishigaki, Tatsuya
–arXiv.org Artificial Intelligence
This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.
arXiv.org Artificial Intelligence
Oct-12-2023
- Country:
- North America
- United States
- Texas (0.04)
- Pennsylvania (0.04)
- Washington > King County
- Seattle (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.05)
- Trinidad and Tobago > Trinidad
- United States
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Italy > Calabria
- Asia > Japan
- North America
- Genre:
- Research Report > New Finding (0.89)
- Technology: