Chung, Tagyoung


Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators

arXiv.org Artificial Intelligence

Encoder-decoder based neural architectures serve as the basis of state-of-the-art approaches in end-to-end open domain dialog systems. Since most of such systems are trained with a maximum likelihood(MLE) objective they suffer from issues such as lack of generalizability and the generic response problem, i.e., a system response that can be an answer to a large number of user utterances, e.g., "Maybe, I don't know." Having explicit feedback on the relevance and interestingness of a system response at each turn can be a useful signal for mitigating such issues and improving system quality by selecting responses from different approaches. Towards this goal, we present a system that evaluates chatbot responses at each dialog turn for coherence and engagement. Our system provides explicit turn-level dialog quality feedback, which we show to be highly correlated with human evaluation. To show that incorporating this feedback in the neural response generation models improves dialog quality, we present two different and complementary mechanisms to incorporate explicit feedback into a neural response generation model: reranking and direct modification of the loss function during training. Our studies show that a response generation model that incorporates these combined feedback mechanisms produce more engaging and coherent responses in an open-domain spoken dialog setting, significantly improving the response quality using both automatic and human evaluation.


Simple Question Answering with Subgraph Ranking and Joint-Scoring

arXiv.org Machine Learning

Knowledge graph based simple question answering (KBSQA) is a major area of research within question answering. Although only dealing with simple questions, i.e., questions that can be answered through a single knowledge base (KB) fact, this task is neither simple nor close to being solved. Targeting on the two main steps, subgraph selection and fact selection, the research community has developed sophisticated approaches. However, the importance of subgraph ranking and leveraging the subject--relation dependency of a KB fact have not been sufficiently explored. Motivated by this, we present a unified framework to describe and analyze existing approaches. Using this framework as a starting point, we focus on two aspects: improving subgraph selection through a novel ranking method and leveraging the subject--relation dependency by proposing a joint scoring CNN model with a novel loss function that enforces the well-order of scores. Our methods achieve a new state of the art (85.44% in accuracy) on the SimpleQuestions dataset.


Flexible and Scalable State Tracking Framework for Goal-Oriented Dialogue Systems

arXiv.org Artificial Intelligence

Goal-oriented dialogue systems typically rely on components specifically developed for a single task or domain. This limits such systems in two different ways: If there is an update in the task domain, the dialogue system usually needs to be updated or completely re-trained. It is also harder to extend such dialogue systems to different and multiple domains. The dialogue state tracker in conventional dialogue systems is one such component - it is usually designed to fit a well-defined application domain. For example, it is common for a state variable to be a categorical distribution over a manually-predefined set of entities (Henderson et al., 2013), resulting in an inflexible and hard-to-extend dialogue system. In this paper, we propose a new approach for dialogue state tracking that can generalize well over multiple domains without incorporating any domain-specific knowledge. Under this framework, discrete dialogue state variables are learned independently and the information of a predefined set of possible values for dialogue state variables is not required. Furthermore, it enables adding arbitrary dialogue context as features and allows for multiple values to be associated with a single state variable. These characteristics make it much easier to expand the dialogue state space. We evaluate our framework using the widely used dialogue state tracking challenge data set (DSTC2) and show that our framework yields competitive results with other state-of-the-art results despite incorporating little domain knowledge. We also show that this framework can benefit from widely available external resources such as pre-trained word embeddings.


Parsing Coordination for Spoken Language Understanding

arXiv.org Machine Learning

ABSTRACT Typical spoken language understanding systems provide narrow semantic parses using a domain-specific ontology. The parses contain intents and slots that are directly consumed by downstream domain applications. In this work we discuss expanding such systems to handle compound entities and intents by introducing a domain-agnostic shallow parser that handles linguistic coordination. We show that our model for parsing coordination learns domain-independent and slot-independent features and is able to segment conjunct boundaries of many different phrasal categories. We also show that using adversarial training can be effective for improving generalization across different slot types for coordination parsing. Index Terms-- spoken language understanding, chunking, coordination 1. INTRODUCTION A typical spoken language understanding (SLU) system maps user utterances to domain-specific semantic representations that can be factored into an intent and slots [1, 2]. For example, an utterance, "what is the weather like in boston" has one intent WeatherInfo and one slot type CityName whose value is "boston." Thus, parsing for such systems is often factored into two separate tasks: intent classification and entity recognition whose results are consumed by downstream domain applications.


Multi-Task Learning For Parsing The Alexa Meaning Representation Language

AAAI Conferences

The Alexa Meaning Representation Language (AMRL) is a compositional graph-based semantic representation that includes fine-grained types, properties, actions, and roles and can represent a wide variety of spoken language.  AMRL increases the ability of virtual assistants to represent more complex requests, including logical and conditional statements as well as ones with nested clauses. Due to this representational capacity, the acquisition of large scale data resources is challenging, which limits the accuracy of resulting models. This paper has two primary contributions. First, we develop a linearization of AMRL graphs along with a deep multi-task model that predicts fine-grained types, properties, and intents. Second, we show how to jointly train a model that predicts an existing representation for spoken language understanding (SLU) along with the linearized AMRL parse. The resulting model, which leverages learned embeddings from both tasks, is able to predict the AMRL representation more accurately than other approaches, decreasing the error rates in the full parse by 3.56% absolute and reducing the amount of natively annotated data needed to train accurate parsing models.