Davidson, Sam
User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
Davidson, Sam, Romeo, Salvatore, Shu, Raphael, Gung, James, Gupta, Arshit, Mansour, Saab, Zhang, Yi
One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of our system relative to the related previous work, we do not fine-tune the LLMs used by our system on existing TOD datasets; rather we use in-context learning to prompt the LLMs to generate robust and linguistically diverse output with the goal of simulating the behavior of human interlocutors. Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems. Using this approach, our current simulator is effectively able to interact with several TOD systems, especially on single-intent conversational goals, while generating lexically and syntactically diverse output relative to previous simulators that rely upon fine-tuned models. Finally, we collect a Human2Bot dataset of humans interacting with the same TOD systems with which we experimented in order to better quantify these achievements.
IdEALS: Idiomatic Expressions for Advancement of Language Skills
Ri, Narutatsu, Sun, Bill, Davidson, Sam, Yu, Zhou
Although significant progress has been made in developing methods for Grammatical Error Correction (GEC), addressing word choice improvements has been notably lacking and enhancing sentence expressivity by replacing phrases with advanced expressions is an understudied aspect. In this paper, we focus on this area and present our investigation into the task of incorporating the usage of idiomatic expressions in student writing. To facilitate our study, we curate extensive training sets and expert-annotated testing sets using real-world data and evaluate various approaches and compare their performance against human experts.
Using Chatbots to Teach Languages
Li, Yu, Chen, Chun-Yen, Yu, Dian, Davidson, Sam, Hou, Ryan, Yuan, Xun, Tan, Yinghua, Pham, Derek, Yu, Zhou
This paper reports on progress towards building an online language learning tool to provide learners with conversational experience by using dialog systems as conversation practice partners. Our system can adapt to users' language proficiency on the fly. We also provide automatic grammar error feedback to help users learn from their mistakes. According to our first adopters, our system is entertaining and useful. Furthermore, we will provide the learning technology community a large-scale conversation dataset on language learning and grammar correction. Our next step is to make our system more adaptive to user profile information by using reinforcement learning algorithms.
Gunrock 2.0: A User Adaptive Social Conversational System
Liang, Kaihui, Chau, Austin, Li, Yu, Lu, Xueyuan, Yu, Dian, Zhou, Mingyang, Jain, Ishan, Davidson, Sam, Arnold, Josh, Nguyen, Minh, Yu, Zhou
Gunrock 2.0 is built on top of Gunrock with an emphasis on user adaptation. Gunrock 2.0 combines various neural natural language understanding modules, including named entity detection, linking, and dialog act prediction, to improve user understanding. Its dialog management is a hierarchical model that handles various topics, such as movies, music, and sports. The system-level dialog manager can handle question detection, acknowledgment, error handling, and additional functions, making downstream modules much easier to design and implement. The dialog manager also adapts its topic selection to accommodate different users' profile information, such as inferred gender and personality. The generation model is a mix of templates and neural generation models. Gunrock 2.0 is able to achieve an average rating of 3.73 at its latest build from May 29th to June 4th.
Gunrock: A Social Bot for Complex and Engaging Long Conversations
Yu, Dian, Cohn, Michelle, Yang, Yi Mang, Chen, Chun-Yen, Wen, Weiming, Zhang, Jiaping, Zhou, Mingyang, Jesse, Kevin, Chau, Austin, Bhowmick, Antara, Iyer, Shreenath, Sreenivasulu, Giritheja, Davidson, Sam, Bhandare, Ashwin, Yu, Zhou
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users' engagement (e.g., ratings, number of turns). Additionally, users' backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.
Dependency Parsing for Spoken Dialog Systems
Davidson, Sam, Yu, Dian, Yu, Zhou
Compared to constituency parsing and semantic role labeling, dependency parsing provides more clear relationships between predicates and arguments (Johansson and Nugues, 2008). Constituency parsers provide information about noun phrases in a sentence, but provide only limited information about relationships within a noun phrase. For example, in the sentence "What do you think about Google's privacy policy being reviewed by journalists from CNN?," a constituency parser would place "Google's privacy policy being reviewed by journalists from CNN" under a single phrasal node. Similarly, a semantic role labeling system would tend to label the same phrase as an argument of the verb, but it would not disambiguate the relationships within the phrase. Finally, NER only provides information about named entities which may or may not be the key semantic content of the sentence. Dependency parsers, by contrast, can provide information about relationships when a sentence contains multiple entities, even when those entities are within the same phrase. Identifying relationships between entities in a user utterance can help a dialog system formulate a more appropriate response. For instance, in the sentence about "Google's privacy policy" mentioned above, there are multiple entities for the system to consider. The system must determine the most important entity in the utterance in order to model the topic and generate an appropriate response.