Ghandeharioun, Asma, Shen, Judy Hanwen, Jaques, Natasha, Ferguson, Craig, Jones, Noah, Lapedriza, Agata, Picard, Rosalind
Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r .7,
Zheng, Zilong, Wang, Wenguan, Qi, Siyuan, Zhu, Song-Chun
We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.
Onken, Arno, Grünewälder, Steffen, Obermayer, Klaus
The linear correlation coefficient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefficient is in general insufficient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-fire neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks.
Saleh, Abdelrhman, Deutsch, Tovly, Casper, Stephen, Belinkov, Yonatan, Shieber, Stuart
The predominant approach to open-domain dialog generation relies on end-to-end training of neural models on chat datasets. However, this approach provides little insight as to what these models learn (or do not learn) about engaging in dialog. In this study, we analyze the internal representations learned by neural open-domain dialog systems and evaluate the quality of these representations for learning basic conversational skills. Our results suggest that standard open-domain dialog systems struggle with answering questions, inferring contradiction, and determining the topic of conversation, among other tasks. We also find that the dyadic, turn-taking nature of dialog is not fully leveraged by these models. By exploring these limitations, we highlight the need for additional research into architectures and training methods that can better capture high-level information about dialog.
Davidson, Sam, Yu, Dian, Yu, Zhou
Compared to constituency parsing and semantic role labeling, dependency parsing provides more clear relationships between predicates and arguments (Johansson and Nugues, 2008). Constituency parsers provide information about noun phrases in a sentence, but provide only limited information about relationships within a noun phrase. For example, in the sentence "What do you think about Google's privacy policy being reviewed by journalists from CNN?," a constituency parser would place "Google's privacy policy being reviewed by journalists from CNN" under a single phrasal node. Similarly, a semantic role labeling system would tend to label the same phrase as an argument of the verb, but it would not disambiguate the relationships within the phrase. Finally, NER only provides information about named entities which may or may not be the key semantic content of the sentence. Dependency parsers, by contrast, can provide information about relationships when a sentence contains multiple entities, even when those entities are within the same phrase. Identifying relationships between entities in a user utterance can help a dialog system formulate a more appropriate response. For instance, in the sentence about "Google's privacy policy" mentioned above, there are multiple entities for the system to consider. The system must determine the most important entity in the utterance in order to model the topic and generate an appropriate response.