Contextual Data Augmentation for Task-Oriented Dialog Systems
Axman, Dustin, Ray, Avik, Garg, Shubham, Huang, Jing
–arXiv.org Artificial Intelligence
Alexa, Siri, Google assistant) are able to accomplish various tasks by interacting with them via natural language conversation. Task-oriented dialog models form the core technology behind these applications, which understands users' natural language utterances [1, 2], keeps track of the conversation [3, 4], performs requested tasks (e.g. API calls) [5, 6], and generates appropriate meaningful response to the user [7, 8]. Training neural task-oriented dialog models [9, 10, 11], requires a large amount of annotated data, which is difficult to obtain for model developers. While crowd-sourcing and dialog simulation based on agent interplay [12, 13] addresses this issue to a certain extent, these are slow and don't provide sufficient coverage of different natural language (NL) user turn surface form variations. Recently, large pre-trained language models (e.g. GPT-2 [14], T5 [15]) have been successfully used to generate fluent agent dialog responses, both with dialog context [16, 8, 17] or without it [18, 19]. However, it is unclear if similar models can capture the large variation of user turn distribution in such task-oriented dialogs. Previous work on data augmentation for spoken language understanding has largely focused on generating paraphrases of user utterance, with a specific goal and set of entities [20, 21, 22]. However, such utterances again fail to provide sufficient coverage of the large semantic space possible between dialog turns, and may not improve performance of downstream task-oriented dialog systems.
arXiv.org Artificial Intelligence
Oct-16-2023