A Survey on Recent Advances in Conversational Data Generation
Soudani, Heydar, Petcu, Roxana, Kanoulas, Evangelos, Hasibi, Faegheh
–arXiv.org Artificial Intelligence
Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.
arXiv.org Artificial Intelligence
May-12-2024
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.04)
- Pennsylvania
- Philadelphia County > Philadelphia (0.04)
- Allegheny County > Pittsburgh (0.04)
- California
- San Diego County > San Diego (0.04)
- Los Angeles County > Los Angeles (0.04)
- New York > New York County
- New York City (0.04)
- Canada
- Europe
- France (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Austria > Styria
- Graz (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- West Midlands > Birmingham (0.04)
- Greece > Attica
- Athens (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.04)
- Flanders > Antwerp Province
- Antwerp (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.87)
- Industry:
- Media (0.67)
- Consumer Products & Services (0.46)
- Leisure & Entertainment (0.46)
- Technology:
- Information Technology
- Knowledge Management (1.00)
- Information Management (1.00)
- Communications > Social Media (1.00)
- Artificial Intelligence
- Representation & Reasoning
- Personal Assistant Systems (1.00)
- Expert Systems (0.93)
- Natural Language
- Large Language Model (1.00)
- Discourse & Dialogue (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Representation & Reasoning
- Information Technology