Controllable Dialogue Simulation with In-Context Learning
Li, Zekun, Chen, Wenhu, Li, Shiyang, Wang, Hong, Qian, Jing, Yan, Xifeng
–arXiv.org Artificial Intelligence
Building dialogue systems requires a large corpus of annotated dialogues. Such datasets are usually created via crowdsourcing, which is expensive and time-consuming. In this paper, we propose \textsc{Dialogic}, a novel dialogue simulation method based on large language model in-context learning to automate dataset creation. Seeded with a few annotated dialogues, \textsc{Dialogic} automatically selects in-context examples for demonstration and prompts GPT-3 to generate new dialogues and annotations in a controllable way. Our method can rapidly expand a small set of dialogue data with minimum or zero \textit{human involvement} and \textit{parameter update} and is thus much more cost-efficient and time-saving than crowdsourcing. Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed. When enough data is available, our method can still serve as an effective data augmentation method. Human evaluation results also show that our simulated dialogues have near-human fluency and annotation accuracy. The code and data are available at \textbf{\url{https://github.com/Leezekun/dialogic}}.
arXiv.org Artificial Intelligence
Jun-5-2023
- Country:
- North America > United States
- California > Santa Barbara County > Santa Barbara (0.04)
- Europe
- United Kingdom > England
- Leicestershire > Leicester (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Consumer Products & Services (0.93)
- Technology: