Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Takahashi, Kosuke, Omi, Takahiro, Arima, Kosuke, Ishigaki, Tatsuya

Oct-12-2023–arXiv.org Artificial Intelligence

This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

evaluation, proceedings, qa pair, (14 more...)

arXiv.org Artificial Intelligence

Oct-12-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Texas (0.04)
    - Pennsylvania (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.05)
  - Trinidad and Tobago > Trinidad
    - Arima > Arima (0.05)
- Europe
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > Japan
  - Honshū > Tōhoku (0.05)

Genre:
- Research Report > New Finding (0.89)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Question Answering (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found