During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.
In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
Deep learning has significantly improved state-of-the-art performance for natural language processing tasks like machine translation, summarization, question answering, and text classification. Each of these tasks is typically studied with a specific metric, and performance is often measured on a set of standard benchmark datasets. This has led to the development of architectures designed specifically for those tasks and metrics, but it does not necessarily promote the emergence of general NLP models, those which can perform well across a wide variety of NLP tasks. In order to explore the possibility of such models as well as the tradeoffs that arise in optimizing for them, we introduce the Natural Language Decathlon (decaNLP). The goal of the Decathlon is to explore models that generalize to all ten tasks and investigate how such models differ from those trained for single tasks.
This research hypothesized that a practical approach in the form of a solution framework known as Natural Language Understanding and Reasoning for Intelligence (NaLURI), which combines full-discourse natural language understanding, powerful representation formalism capable of exploiting ontological information and reasoning approach with advanced features, will solve the following problems without compromising practicality factors: 1) restriction on the nature of question and response, and 2) limitation to scale across domains and to real-life natural language text.