Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints
Penzo, Nicolò, Guerini, Marco, Lepri, Bruno, Glavaš, Goran, Tonelli, Sara
–arXiv.org Artificial Intelligence
Multi-Party Conversations (MPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., as a consequence of ``reply-to'' links). This work explores the feasibility of generating diverse MPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants' stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as MPC generators, where we task the LLM to generate a whole MPC at once and (ii.) LLMs as MPC parties, where the LLM generates one turn of the conversation at a time, provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the quality of obtained MPCs via human annotation and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality MPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating MPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality MPCs.
arXiv.org Artificial Intelligence
Feb-19-2025
- Country:
- South America > Chile
- Oceania > Australia
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- New Mexico > Los Alamos County
- Los Alamos (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > Palo Alto (0.04)
- Texas > Travis County
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Europe
- Czechia > Prague (0.04)
- Italy > Trentino-Alto Adige/Südtirol
- Trentino Province > Trento (0.04)
- Germany > Bavaria
- Lower Franconia > Würzburg (0.04)
- France > Provence-Alpes-Côte d'Azur
- Alpes-Maritimes > Nice (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
- Technology: