DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models
Hu, Songbo, Wang, Xiaobin, Yuan, Zhangdie, Korhonen, Anna, Vulić, Ivan
–arXiv.org Artificial Intelligence
We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.
arXiv.org Artificial Intelligence
Jan-4-2024
- Country:
- Asia (0.67)
- Europe
- Spain (0.28)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.28)
- North America
- Canada (0.28)
- United States (0.46)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: