To build Sounding Board, we develop a system architecture that is capable of accommodating dialog strategies that we designed for socialbot conversations. The architecture consists of a multi-dimensional language understanding module for analyzing user utterances, a hierarchical dialog management framework for dialog context tracking and complex dialog control, and a language generation process that realizes the response plan and makes adjustments for speech synthesis. Additionally, we construct a new knowledge base to power the socialbot by collecting social chat content from a variety of sources. An important contribution of the system is the synergy between the knowledge base and the dialog management, i.e., the use of a graph structure to organize the knowledge base that makes dialog control very efficient in bringing related content to the discussion. Using the data collected from Sounding Board during the competition, we carry out in-depth analyses of socialbot conversations and user ratings which provide valuable insights in evaluation methods for socialbots. We additionally investigate a new approach for system evaluation and diagnosis that allows scoring individual dialog segments in the conversation. Finally, observing that socialbots suffer from the issue of shallow conversations about topics associated with unstructured data, we study the problem of enabling extended socialbot conversations grounded on a document. To bring together machine reading and dialog control techniques, a graph-based document representation is proposed, together with methods for automatically constructing the graph. Using the graph-based representation, dialog control can be carried out by retrieving nodes or moving along edges in the graph. To illustrate the usage, a mixed-initiative dialog strategy is designed for socialbot conversations on news articles.
Khatri, Chandra, Hedayatnia, Behnam, Venkatesh, Anu, Nunn, Jeff, Pan, Yi, Liu, Qing, Song, Han, Gottardi, Anna, Kwatra, Sanjeev, Pancholi, Sanju, Cheng, Ming, Chen, Qinglang, Stubel, Lauren, Gopalakrishnan, Karthik, Bland, Kate, Gabriel, Raefer, Mandal, Arindam, Hakkani-Tur, Dilek, Hwang, Gene, Michel, Nate, King, Eric, Prasad, Rohit
Building open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. Alexa Prize was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the second iteration of the competition in 2018, university teams advanced the state of the art by using context in dialog models, leveraging knowledge graphs for language understanding, handling complex utterances, building statistical and hierarchical dialog managers, and leveraging model-driven signals from user responses. The 2018 competition also included the provision of a suite of tools and models to the competitors including the CoBot (conversational bot) toolkit, topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could focus on building knowledge-rich, coherent and engaging multi-turn dialog systems. This paper outlines the advances developed by the university teams as well as the Alexa Prize team to achieve the common goal of advancing the science of Conversational AI. We address several key open-ended problems such as conversational speech recognition, open domain natural language understanding, commonsense reasoning, statistical dialog management and dialog evaluation. These collaborative efforts have driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition. For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize. Socialbots improved in quality significantly more rapidly in 2018, in part due to the release of the CoBot toolkit, with new entrants attaining an average rating of 3.35 just 1 week into the semifinals, compared to 9 weeks in the 2017 competition.
Yu, Dian, Cohn, Michelle, Yang, Yi Mang, Chen, Chun-Yen, Wen, Weiming, Zhang, Jiaping, Zhou, Mingyang, Jesse, Kevin, Chau, Austin, Bhowmick, Antara, Iyer, Shreenath, Sreenivasulu, Giritheja, Davidson, Sam, Bhandare, Ashwin, Yu, Zhou
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users' engagement (e.g., ratings, number of turns). Additionally, users' backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.
Gunrock 2.0 is built on top of Gunrock with an emphasis on user adaptation. Gunrock 2.0 combines various neural natural language understanding modules, including named entity detection, linking, and dialog act prediction, to improve user understanding. Its dialog management is a hierarchical model that handles various topics, such as movies, music, and sports. The system-level dialog manager can handle question detection, acknowledgment, error handling, and additional functions, making downstream modules much easier to design and implement. The dialog manager also adapts its topic selection to accommodate different users' profile information, such as inferred gender and personality. The generation model is a mix of templates and neural generation models. Gunrock 2.0 is able to achieve an average rating of 3.73 at its latest build from May 29th to June 4th.
Venkatesh, Anu, Khatri, Chandra, Ram, Ashwin, Guo, Fenfei, Gabriel, Raefer, Nagar, Ashish, Prasad, Rohit, Cheng, Ming, Hedayatnia, Behnam, Metallinou, Angeliki, Goel, Rahul, Yang, Shaohua, Raju, Anirudh
Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs.