Controllable Dialogue Simulation with In-Context Learning

Li, Zekun, Chen, Wenhu, Li, Shiyang, Wang, Hong, Qian, Jing, Yan, Xifeng

Jun-5-2023–arXiv.org Artificial Intelligence

Building dialogue systems requires a large corpus of annotated dialogues. Such datasets are usually created via crowdsourcing, which is expensive and time-consuming. In this paper, we propose \textsc{Dialogic}, a novel dialogue simulation method based on large language model in-context learning to automate dataset creation. Seeded with a few annotated dialogues, \textsc{Dialogic} automatically selects in-context examples for demonstration and prompts GPT-3 to generate new dialogues and annotations in a controllable way. Our method can rapidly expand a small set of dialogue data with minimum or zero \textit{human involvement} and \textit{parameter update} and is thus much more cost-efficient and time-saving than crowdsourcing. Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed. When enough data is available, our method can still serve as an effective data augmentation method. Human evaluation results also show that our simulated dialogues have near-human fluency and annotation accuracy. The code and data are available at \textbf{\url{https://github.com/Leezekun/dialogic}}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-5-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Barbara County > Santa Barbara (0.04)
- Europe
  - United Kingdom > England
    - Leicestershire > Leicester (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Consumer Products & Services (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Discourse & Dialogue (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.39)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found