Multi-Document Grounded Multi-Turn Synthetic Dialog Generation

Lee, Young-Suk, Gunasekara, Chulaka, Contractor, Danish, Astudillo, Ramón Fernandez, Florian, Radu

arXiv.org Artificial Intelligence 

For multi-document grounded dialog generation, As instruction-tuned language models have proven user queries and agent answers are based on top-k highly effective to generalize to new tasks, (Chung retrieved passages. In particular, we generate an et al., 2022; Wei et al., 2021; Ouyang et al., 2022; initial user query from a single document source Mishra et al., 2022; Wang et al., 2022b), there has and generate the agent answer from top-k passages been growing interest to acquire synthetic data sets retrieved on the initial user query. Subsequent generated from pre-trained language models with a user queries and all agent answers are grounded minimal or no human supervision, (Honovich et al., on the retrieved passages and dialog history. We 2022; Wang et al., 2023; Xu et al., 2023; Lee et al., use a series of carefully designed prompts to ensure 2023). While there has been an exploration of synthetic generated agent answers continue to remain data generation for persona-grounded dialog meaningful in the presence of retrieved passages, generation (Jang et al., 2022; Bao et al., 2023), often noisier than human generated documents.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found