Our goal is to share with the community the breadth of artificial intelligence (AI) and natural language (NL) technologies required to develop such an application along with learnings from target end users. We first give an overview of our application from the perspective of the end user. We then present the architecture of our application along with the main AI and NL components, which were developed over multiple phases. The first phase focuses on enabling core functionality such as effectively finding programs matching the user's intent. The second phase focuses on enabling dialogue with the user.
Yeh, Peter Z. (Nuance Communications) | Douglas, Ben (Nuance Communications) | Jarrold, William (Nuance Communications) | Ratnaparkhi, Adwait (Nuance Communications) | Ramachandran, Deepak (Nuance Communications) | Patel-Schneider, Peter F. (Nuance Communications) | Laverty, Stephen (Nuance Communications) | Tikku, Nirvana (Nuance Communications) | Brown, Sean (Nuance Communications) | Mendel, Jeremy (Nuance Communications)
In this paper, we present a speech-driven second screen application for TV program discovery. We give an overview of the application and its architecture. We also present a user study along with a failure analysis. The results from the study are encouraging, and demonstrate our application's effectiveness in the target domain. We conclude with a discussion of follow-on efforts to further enhance our application.
In this paper, we present an approach to mine large-scale knowledge graphs to discover inference paths for query expansion in NLIDB (Natural Language Interface to Databases). Addressing this problem is important in order for NLIDB applications to effectively handle relevant concepts in the domain of interest that do not correspond to any structured fields in the target database. We also present preliminary observations on the performance of our approach applied to Freebase, and conclude with discussions on next steps to further evaluate and extend our approach.
Spoken language is an important and natural way for people to communicate with computers. Nonetheless, habitable, reliable, and efficient human-machine dialogue remains difficult to achieve. This paper describes a multi-threaded semi-synchronous architecture for spoken dialogue systems. The focus here is on its utterance interpretation module. Unlike most architectures for spoken dialogue systems, this new one is designed to be robust to noisy speech recognition through earlier reliance on context, a mixture of rationales for interpretation, and fine-grained use of confidence measures. We report here on a pilot study that demonstrates its robust understanding of users’ objectives, and we compare it with our earlier spoken dialogue system implemented in a traditional pipeline architecture. Substantial improvements appear at all tested levels of recognizer performance.
Khatri, Chandra, Hedayatnia, Behnam, Venkatesh, Anu, Nunn, Jeff, Pan, Yi, Liu, Qing, Song, Han, Gottardi, Anna, Kwatra, Sanjeev, Pancholi, Sanju, Cheng, Ming, Chen, Qinglang, Stubel, Lauren, Gopalakrishnan, Karthik, Bland, Kate, Gabriel, Raefer, Mandal, Arindam, Hakkani-Tur, Dilek, Hwang, Gene, Michel, Nate, King, Eric, Prasad, Rohit
Building open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. Alexa Prize was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the second iteration of the competition in 2018, university teams advanced the state of the art by using context in dialog models, leveraging knowledge graphs for language understanding, handling complex utterances, building statistical and hierarchical dialog managers, and leveraging model-driven signals from user responses. The 2018 competition also included the provision of a suite of tools and models to the competitors including the CoBot (conversational bot) toolkit, topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could focus on building knowledge-rich, coherent and engaging multi-turn dialog systems. This paper outlines the advances developed by the university teams as well as the Alexa Prize team to achieve the common goal of advancing the science of Conversational AI. We address several key open-ended problems such as conversational speech recognition, open domain natural language understanding, commonsense reasoning, statistical dialog management and dialog evaluation. These collaborative efforts have driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition. For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize. Socialbots improved in quality significantly more rapidly in 2018, in part due to the release of the CoBot toolkit, with new entrants attaining an average rating of 3.35 just 1 week into the semifinals, compared to 9 weeks in the 2017 competition.