GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems
Ding, Bosheng, Hu, Junjie, Bing, Lidong, Aljunied, Sharifah Mahani, Joty, Shafiq, Si, Luo, Miao, Chunyan
–arXiv.org Artificial Intelligence
Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due to the high cost of data curation, or ignore the fact that dialogue entities barely exist in countries speaking these languages. To tackle these limitations, we introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
arXiv.org Artificial Intelligence
Oct-14-2021
- Country:
- North America > United States
- Pennsylvania (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Europe
- Sweden > Stockholm
- Stockholm (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Sweden > Stockholm
- Asia
- Singapore (0.04)
- Indonesia > Java
- Thailand > Bangkok
- Bangkok (0.04)
- Vietnam > Hồ Chí Minh City
- Hồ Chí Minh City (0.04)
- China > Shanghai
- Shanghai (0.05)
- South Korea > Seoul
- Seoul (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Republic of Türkiye > Istanbul Province
- India > Karnataka
- Bengaluru (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Technology:
- Information Technology
- Data Science > Data Quality (0.75)
- Artificial Intelligence
- Machine Learning (1.00)
- Speech (0.93)
- Natural Language
- Machine Translation (1.00)
- Discourse & Dialogue (0.68)
- Information Technology