SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages
Kautsar, Muhammad Dehan Al, Candra, Aswin, Hakim, Muhammad Alif Al, Kahfi, Maxalmina Satria, Koto, Fajri, Aji, Alham Fikri, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Winata, Genta Indra
–arXiv.org Artificial Intelligence
Although numerous datasets have been developed to support dialogue systems, most existing chit-chat datasets overlook the cultural nuances inherent in natural human conversations. To address this gap, we introduce SEADialogues, a culturally grounded dialogue dataset centered on Southeast Asia, a region with over 700 million people and immense cultural diversity. Our dataset features dialogues in eight languages from six Southeast Asian countries, many of which are low-resource despite having sizable speaker populations. To enhance cultural relevance and personalization, each dialogue includes persona attributes and two culturally grounded topics that reflect everyday life in the respective communities. Furthermore, we release a multi-turn dialogue dataset to advance research on culturally aware and human-centric large language models, including conversational dialogue agents.
arXiv.org Artificial Intelligence
Aug-12-2025
- Country:
- Asia
- Indonesia
- Middle East
- Republic of Türkiye > Ankara Province
- Ankara (0.04)
- UAE > Dubai Emirate
- Dubai (0.04)
- Republic of Türkiye > Ankara Province
- Singapore (0.04)
- Southeast Asia (0.24)
- Thailand
- Bangkok > Bangkok (0.04)
- Chiang Mai > Chiang Mai (0.04)
- Songkhla > Songkhla (0.04)
- Asia
- Genre:
- Research Report (0.81)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Film (0.46)
- Technology: