Discourse & Dialogue
Physiological signals could be the key to 'emotionally intelligent' AI, scientists say
Speech and language recognition technology is a rapidly developing field, which has led to the emergence of novel speech dialog systems, such as Amazon Alexa and Siri. A significant milestone in the development of dialog artificial intelligence (AI) systems is the addition of emotional intelligence. A system able to recognize the emotional states of the user, in addition to understanding language, would generate a more empathetic response, leading to a more immersive experience for the user. "Multimodal sentiment analysis" is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person's psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems.
"Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking
Jacqmin, Lรฉo, Rojas-Barahona, Lina M., Favre, Benoit
While communicating with a user, a task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking (DST) is crucial because it directly informs the downstream dialogue policy. DST has received a lot of interest in recent years with the text-to-text paradigm emerging as the favored approach. In this review paper, we first present the task and its associated datasets. Then, considering a large number of recent publications, we identify highlights and advances of research in 2021-2022. Although neural approaches have enabled significant progress, we argue that some critical aspects of dialogue systems such as generalizability are still underexplored. To motivate future studies, we propose several research avenues.
Interactive Evaluation of Dialog Track at DSTC9
Mehri, Shikib, Feng, Yulan, Gordon, Carla, Alavi, Seyed Hossein, Traum, David, Eskenazi, Maxine
The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to extend dialog models beyond static datasets by assessing them in an interactive setting with real users. Our track challenges participants to develop strong response generation models and explore strategies that extend them to back-and-forth interactions with real users. The progression from static corpora to interactive evaluation introduces unique challenges and facilitates a more thorough assessment of open-domain dialog systems. This paper provides an overview of the track, including the methodology and results. Furthermore, it provides insights into how to best evaluate open-domain dialog models
Controllable User Dialogue Act Augmentation for Dialogue State Tracking
Lai, Chun-Mao, Hsu, Ming-Hao, Huang, Chao-Wei, Chen, Yun-Nung
Prior work has demonstrated that data augmentation is useful for improving dialogue state tracking. However, there are many types of user utterances, while the prior method only considered the simplest one for augmentation, raising the concern about poor generalization capability. In order to better cover diverse dialogue acts and control the generation quality, this paper proposes controllable user dialogue act augmentation (CUDA-DST) to augment user utterances with diverse behaviors. With the augmented data, different state trackers gain improvement and show better robustness, achieving the state-of-the-art performance on MultiWOZ 2.1
A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog
Indeed, commercial task-oriented dialog systems in the form of smart devices like Amazon's Alexa are used by millions of people every day. Within the academic research community, however, task-oriented dialog system models are often benchmarked on relatively few evaluation datasets. This is in spite of the fact that the past few years have seen a substantial growth in the number of available datasets for building and evaluating intent classification and slot-filling models for task-oriented dialog systems. Thus, the goal of this survey is to catalog these intent classification and slot-filling datasets to help facilitate their use in building and evaluating dialog systems and beyond. Other surveys have discussed dialog datasets in depth (Serban et al. 2018), but exclude almost all intent classification and slot-filling datasets, and model-focused surveys on dialog systems mostly focus on models and pay much less attention to datasets.
Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models
Cai, Yucheng, Liu, Hong, Ou, Zhijian, Huang, Yi, Feng, Junlan
Developing semi-supervised task-oriented dialog (TOD) systems by leveraging unlabeled dialog data has attracted increasing interests. For semi-supervised learning of latent state TOD models, variational learning is often used, but suffers from the annoying high-variance of the gradients propagated through discrete latent variables and the drawback of indirectly optimizing the target log-likelihood. Recently, an alternative algorithm, called joint stochastic approximation (JSA), has emerged for learning discrete latent variable models with impressive performances. In this paper, we propose to apply JSA to semi-supervised learning of the latent state TOD models, which is referred to as JSA-TOD. To our knowledge, JSA-TOD represents the first work in developing JSA based semi-supervised learning of discrete latent variable conditional models for such long sequential generation problems like in TOD systems. Extensive experiments show that JSA-TOD significantly outperforms its variational learning counterpart. Remarkably, semi-supervised JSA-TOD using 20% labels performs close to the full-supervised baseline on MultiWOZ2.1.
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
Cohen, Deborah, Ryu, Moonkyung, Chow, Yinlam, Keller, Orgad, Greenberg, Ido, Hassidim, Avinatan, Fink, Michael, Matias, Yossi, Szpektor, Idan, Boutilier, Craig, Elidan, Gal
Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.
Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning
Ohashi, Atsumoto, Higashinaka, Ryuichiro
Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be differentiable. Through dialogue simulation and human evaluation on the MultiWOZ dataset, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules.
DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit
Huynh, Jessica, Chiang, Ting-Rui, Bigham, Jeffrey, Eskenazi, Maxine
Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently good quality. This can be due to the way that the requester presents a task and how they interact with the workers. This paper introduces DialCrowd 2.0 to help requesters obtain higher quality data by, for example, presenting tasks more clearly and facilitating effective communication with workers. DialCrowd 2.0 guides developers in creating improved Human Intelligence Tasks (HITs) and is directly applicable to the workflows used currently by developers and researchers.
UniDU: Towards A Unified Generative Dialogue Understanding Framework
Chen, Zhi, Chen, Lu, Chen, Bei, Qin, Libo, Liu, Yuncong, Zhu, Su, Lou, Jian-Guang, Yu, Kai
With the development of pre-trained language models, remarkable success has been witnessed in dialogue understanding (DU). However, current DU approaches usually employ independent models for each distinct DU task without considering shared knowledge across different DU tasks. In this paper, we propose a unified generative dialogue understanding framework, named {\em UniDU}, to achieve effective information exchange across diverse DU tasks. Here, we reformulate all DU tasks into a unified prompt-based generative model paradigm. More importantly, a novel model-agnostic multi-task training strategy (MATS) is introduced to dynamically adapt the weights of diverse tasks for best knowledge sharing during training, based on the nature and available data of each task. Experiments on ten DU datasets covering five fundamental DU tasks show that the proposed UniDU framework largely outperforms task-specific well-designed methods on all tasks. MATS also reveals the knowledge-sharing structure of these tasks. Finally, UniDU obtains promising performance in the unseen dialogue domain, showing the great potential for generalization.