Takanobu, Ryuichi
Robustness Testing of Language Understanding in Dialog Systems
Liu, Jiexi, Takanobu, Ryuichi, Wen, Jiaxin, Wan, Dazhen, Nie, Weiran, Li, Hongyan, Li, Cheng, Peng, Wei, Huang, Minlie
Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning
Qin, Yujia, Lin, Yankai, Takanobu, Ryuichi, Liu, Zhiyuan, Li, Peng, Ji, Heng, Huang, Minlie, Sun, Maosong, Zhou, Jie
Pre-trained Language Models (PLMs) have shown strong performance in various downstream Natural Language Processing (NLP) tasks. However, PLMs still cannot well capture the factual knowledge in the text, which is crucial for understanding the whole text, especially for document-level language understanding tasks. To address this issue, we propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Specifically, (1) to better understand entities, we propose an entity discrimination task that distinguishes which tail entity can be inferred by the given head entity and relation. (2) Besides, to better understand relations, we employ a relation discrimination task which distinguishes whether two entity pairs are close or not in relational semantics. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting. Meanwhile, ERICA achieves comparable or better performance on sentence-level tasks. We will release the datasets, source codes and pre-trained language models for further research explorations.
Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation
Takanobu, Ryuichi, Zhu, Qi, Li, Jinchao, Peng, Baolin, Gao, Jianfeng, Huang, Minlie
There is a growing interest in developing goal-oriented dialog systems which serve users in accomplishing complex tasks through multi-turn conversations. Although many methods are devised to evaluate and improve the performance of individual dialog components, there is a lack of comprehensive empirical study on how different components contribute to the overall performance of a dialog system. In this paper, we perform a system-wise evaluation and present an empirical analysis on different types of dialog systems which are composed of different modules in different settings. Our results show that (1) a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels, (2) component-wise, single-turn evaluation results are not always consistent with the overall performance of a dialog system, and (3) despite the discrepancy between simulators and human users, simulated evaluation is still a valid alternative to the costly human evaluation especially in the early stage of development.
Deep Conversational Recommender in Travel
Liao, Lizi, Takanobu, Ryuichi, Ma, Yunshan, Yang, Xun, Huang, Minlie, Chua, Tat-Seng
When traveling to a foreign country, we are often in dire need of an intelligent conversational agent to provide instant and informative responses to our various queries. However, to build such a travel agent is non-trivial. First of all, travel naturally involves several sub-tasks such as hotel reservation, restaurant recommendation and taxi booking etc, which invokes the need for global topic control. Secondly, the agent should consider various constraints like price or distance given by the user to recommend an appropriate venue. In this paper, we present a Deep Conversational Recommender (DCR) and apply to travel. It augments the sequence-to-sequence (seq2seq) models with a neural latent topic component to better guide response generation and make the training easier. To consider the various constraints for venue recommendation, we leverage a graph convolutional network (GCN) based approach to capture the relationships between different venues and the match between venue and dialog context. For response generation, we combine the topic-based component with the idea of pointer networks, which allows us to effectively incorporate recommendation results. We perform extensive evaluation on a multi-turn task-oriented dialog dataset in travel domain and the results show that our method achieves superior performance as compared to a wide range of baselines.
ConvLab: Multi-Domain End-to-End Dialog System Platform
Lee, Sungjin, Zhu, Qi, Takanobu, Ryuichi, Li, Xiang, Zhang, Yaoqin, Zhang, Zheng, Li, Jinchao, Peng, Baolin, Li, Xiujun, Huang, Minlie, Gao, Jianfeng
We present ConvLab, an open-source multi-domain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments. ConvLab offers a set of fully annotated datasets and associated pre-trained reference models. As a showcase, we extend the MultiWOZ dataset with user dialog act annotations to train all component models and demonstrate how ConvLab makes it easy and effortless to conduct complicated experiments in multi-domain end-to-end dialog settings.
Aggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement Learning
Takanobu, Ryuichi, Zhuang, Tao, Huang, Minlie, Feng, Jun, Tang, Haihong, Zheng, Bo
In this paper, we investigate the task of aggregating search results from heterogeneous sources in an E-commerce environment. First, unlike traditional aggregated web search that merely presents multi-sourced results in the first page, this new task may present aggregated results in all pages and has to dynamically decide which source should be presented in the current page. Second, as pointed out by many existing studies, it is not trivial to rank items from heterogeneous sources because the relevance scores from different source systems are not directly comparable. To address these two issues, we decompose the task into two subtasks in a hierarchical structure: a high-level task for source selection where we model the sequential patterns of user behaviors onto aggregated results in different pages so as to understand user intents and select the relevant sources properly; and a low-level task for item presentation where we formulate a slot filling process to sequentially present the items instead of giving each item a relevance score when deciding the presentation order of heterogeneous items. Since both subtasks can be naturally formulated as sequential decision problems and learn from the future user feedback on search results, we build our model with hierarchical reinforcement learning. Extensive experiments demonstrate that our model obtains remarkable improvements in search performance metrics, and achieves a higher user satisfaction.