Guimarães, Victor
"Stupid robot, I want to speak to a human!" User Frustration Detection in Task-Oriented Dialog Systems
Caralt, Mireia Hernandez, Sekulić, Ivan, Carević, Filip, Khau, Nghia, Popa, Diana Nicoleta, Guedes, Bruna, Guimarães, Victor, Yang, Zeyu, Manso, Andre, Reddy, Meghana, Rosso, Paolo, Mathis, Roland
Detecting user frustration in modern-day task-oriented dialog (TOD) systems is imperative for maintaining overall user satisfaction, engagement, and retention. However, most recent research is focused on sentiment and emotion detection in academic settings, thus failing to fully encapsulate implications of real-world user data. To mitigate this gap, in this work, we focus on user frustration in a deployed TOD system, assessing the feasibility of out-of-the-box solutions for user frustration detection. Specifically, we compare the performance of our deployed keyword-based approach, open-source approaches to sentiment analysis, dialog breakdown detection methods, and emerging in-context learning LLM-based detection. Our analysis highlights the limitations of open-source methods for real-world frustration detection, while demonstrating the superior performance of the LLM-based approach, achieving a 16\% relative improvement in F1 score on an internal benchmark. Finally, we analyze advantages and limitations of our methods and provide an insight into user frustration detection task for industry practitioners.
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems
Sekulić, Ivan, Terragni, Silvia, Guimarães, Victor, Khau, Nghia, Guedes, Bruna, Filipavicius, Modestas, Manso, André Ferreira, Mathis, Roland
In this paper, we introduce DAUS, a generative The field of dialogue systems has seen a notable user simulator for TOD systems. As depicted in surge in the utilization of user simulation approaches, Figure 1, once initialized with the user goal description, primarily for the evaluation and enhancement DAUS engages with the system across of conversational search systems (Owoicho multiple turns, providing information to fulfill the et al., 2023) and task-oriented dialogue (TOD) systems user's objectives. Our aim is to minimize the commonly (Terragni et al., 2023). User simulation plays observed user simulator hallucinations and a pivotal role in replicating the nuanced interactions incorrect responses (right-hand side of Figure 1), of real users with these systems, enabling a with an ultimate objective of enabling detection wide range of applications such as synthetic data of common errors in TOD systems (left-hand side augmentation, error detection, and evaluation (Wan of Figure 1). Our approach is straightforward yet et al., 2022; Sekulić et al., 2022; Li et al., 2022; effective: we build upon the foundation of LLMbased Balog and Zhai, 2023; Ji et al., 2022).
Dataflow Dialogue Generation
Meron, Joram, Guimarães, Victor
We demonstrate task-oriented dialogue generation within the dataflow dialogue paradigm. We show an example of agenda driven dialogue generation for the MultiWOZ domain, and an example of generation without an agenda for the SMCalFlow domain, where we show an improvement in the accuracy of the translation of user requests to dataflow expressions when the generated dialogues are used to augment the translation training dataset.
MultiWOZ-DF -- A Dataflow implementation of the MultiWOZ dataset
Meron, Joram, Guimarães, Victor
Semantic Machines (SM) have introduced the use of the dataflow (DF) paradigm to dialogue modelling, using computational graphs to hierarchically represent user requests, data, and the dialogue history [Semantic Machines et al. 2020]. Although the main focus of that paper was the SMCalFlow dataset (to date, the only dataset with "native" DF annotations), they also reported some results of an experiment using a transformed version of the commonly used MultiWOZ dataset [Budzianowski et al. 2018] into a DF format. In this paper, we expand the experiments using DF for the MultiWOZ dataset, exploring some additional experimental set-ups. The code and instructions to reproduce the experiments reported here have been released. The contributions of this paper are: 1.) A DF implementation capable of executing MultiWOZ dialogues; 2.) Several versions of conversion of MultiWOZ into a DF format are presented; 3.) Experimental results on state match and translation accuracy.