Balakrishnan, Anusha
The Whole Truth and Nothing But the Truth: Faithful and Controllable Dialogue Response Generation with Dataflow Transduction and Constrained Decoding
Fang, Hao, Balakrishnan, Anusha, Jhamtani, Harsh, Bufe, John, Crawford, Jean, Krishnamurthy, Jayant, Pauls, Adam, Eisner, Jason, Andreas, Jacob, Klein, Dan
In a real-world dialogue system, generated text must be truthful and informative while remaining fluent and adhering to a prescribed style. Satisfying these constraints simultaneously is difficult for the two predominant paradigms in language generation: neural language modeling and rule-based generation. We describe a hybrid architecture for dialogue response generation that combines the strengths of both paradigms. The first component of this architecture is a rule-based content selection model defined using a new formal framework called dataflow transduction, which uses declarative rules to transduce a dialogue agent's actions and their results (represented as dataflow graphs) into context-free grammars representing the space of contextually acceptable responses. The second component is a constrained decoding procedure that uses these grammars to constrain the output of a neural language model, which selects fluent utterances. Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue
Kang, Dongyeop, Balakrishnan, Anusha, Shah, Pararth, Crook, Paul, Boureau, Y-Lan, Weston, Jason
Traditional recommendation systems produce static rather than interactive recommendations invariant to a user's specific requests, clarifications, or current mood, and can suffer from the cold-start problem if their tastes are unknown. These issues can be alleviated by treating recommendation as an interactive dialogue task instead, where an expert recommender can sequentially ask about someone's preferences, react to their requests, and recommend more appropriate items. In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other. The task is specifically designed as a cooperative game between two players working towards a quantifiable common goal. We leverage the dataset to develop an end-to-end dialogue system that can simultaneously converse and recommend. Models are first trained to imitate the behavior of human players without considering the task goal itself (supervised training). We then finetune our models on simulated bot-bot conversations between two paired pre-trained models (bot-play), in order to achieve the dialogue goal. Our experiments show that models finetuned with bot-play learn improved dialogue strategies, reach the dialogue goal more often when paired with a human, and are rated as more consistent by humans compared to models trained without bot-play. The dataset and code are publicly available through the ParlAI framework.
Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue
Balakrishnan, Anusha, Rao, Jinfeng, Upasani, Kartikeya, White, Michael, Subba, Rajen
Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Avenues like the E2E NLG Challenge have encouraged the development of neural approaches, particularly sequence-to-sequence (Seq2Seq) models for this problem. The semantic representations used, however, are often underspecified, which places a higher burden on the generation model for sentence planning, and also limits the extent to which generated responses can be controlled in a live system. In this paper, we (1) propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning; (2) introduce a challenging dataset using this representation for the weather domain; (3) introduce a constrained decoding approach for Seq2Seq models that leverages this representation to improve semantic correctness; and (4) demonstrate promising results on our dataset and the E2E dataset.
Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems
Challa, Ashwini, Upasani, Kartikeya, Balakrishnan, Anusha, Subba, Rajen
Neural approaches to Natural Language Generation (NLG) have been promising for goal-oriented dialogue. One of the challenges of productionizing these approaches, however, is the ability to control response quality, and ensure that generated responses are acceptable. We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. While acceptability includes grammatical correctness and semantic correctness, we focus only on grammaticality classification in this paper, and show that existing datasets for grammatical error correction don't correctly capture the distribution of errors that data-driven generators are likely to make. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems. We then explore two supervised learning approaches (CNNs and GBDTs) for classifying grammaticality. Our experiments show that grammaticality classification is very sensitive to the distribution of errors in the data, and that these distributions vary significantly with both the source of the response as well as the domain. We show that it's possible to achieve high precision with reasonable recall on our dataset.
Malaria Likelihood Prediction By Effectively Surveying Households Using Deep Reinforcement Learning
Rajpurkar, Pranav, Polamreddi, Vinaya, Balakrishnan, Anusha
We build a deep reinforcement learning (RL) agent that can predict the likelihood of an individual testing positive for malaria by asking questions about their household. The RL agent learns to determine which survey question to ask next and when to stop to make a prediction about their likelihood of malaria based on their responses hitherto. The agent incurs a small penalty for each question asked, and a large reward/penalty for making the correct/wrong prediction; it thus has to learn to balance the length of the survey with the accuracy of its final predictions. Our RL agent is a Deep Q-network that learns a policy directly from the responses to the questions, with an action defined for each possible survey question and for each possible prediction class. We focus on Kenya, where malaria is a massive health burden, and train the RL agent on a dataset of 6481 households from the Kenya Malaria Indicator Survey 2015. To investigate the importance of having survey questions be adaptive to responses, we compare our RL agent to a supervised learning (SL) baseline that fixes its set of survey questions a priori. We evaluate on prediction accuracy and on the number of survey questions asked on a holdout set and find that the RL agent is able to predict with 80% accuracy, using only 2.5 questions on average. In addition, the RL agent learns to survey adaptively to responses and is able to match the SL baseline in prediction accuracy while significantly reducing survey length.