Gildea, Daniel
Hierarchical Context Tagging for Utterance Rewriting
Jin, Lisa, Song, Linfeng, Jin, Lifeng, Yu, Dong, Gildea, Daniel
Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from low coverage when phrases that must be added to a source utterance cannot be covered by a single context span. This can occur in languages like English that introduce tokens such as prepositions into the rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots are later filled with context spans. HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context. This rule tagging allows HCT to add out-of-context tokens and multiple spans at once; we further cluster the rules to truncate the long tail of the rule distribution. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by ~2 BLEU points.
A Causality-Guided Prediction of the TED Talk Ratings from the Speech-Transcripts using Neural Networks
Tanveer, Md Iftekhar, Hasan, Md Kamrul, Gildea, Daniel, Hoque, M. Ehsan
Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository---TED Talks---to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the bias present in the dataset (e.g., the speakers' reputations, popularity gained by publicity, etc.) by modeling the data generating process using a causal diagram. We use a word sequence based recurrent architecture and a dependency tree based recursive architecture as the neural networks for predicting the TED talk ratings. Our neural network models can predict the ratings with an average F-score of 0.77 which largely outperforms the competitive baseline method.
Semantic Neural Machine Translation using AMR
Song, Linfeng, Gildea, Daniel, Zhang, Yue, Wang, Zhiguo, Su, Jinsong
It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (short for abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.
Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks
Song, Linfeng, Wang, Zhiguo, Yu, Mo, Zhang, Yue, Florian, Radu, Gildea, Daniel
Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question. Previous work approximates global evidence with local coreference information, encoding coreference chains with DAG-styled GRU layers within a gated-attention reader. However, coreference is limited in providing information for rich inference. We introduce a new method for better connecting global evidence, which forms more complex graphs compared to DAGs. To perform evidence integration on our graphs, we investigate two recent graph neural networks, namely graph convolutional network (GCN) and graph recurrent network (GRN). Experiments on two standard datasets show that richer global information leads to better answers. Our method performs better than all published results on these datasets.
AMR Parsing With Cache Transition Systems
Peng, Xiaochang (University of Rochester) | Gildea, Daniel (University of Rochester) | Satta, Giorgio (University of Padua)
In this paper, we present a transition system that generalizes transition-based dependency parsing techniques to generate AMR graphs rather than tree structures. In addition to a buffer and a stack, we use a fixed-size cache, and allow the system to build arcs to any vertices present in the cache at the same time. The size of the cache provides a parameter that can trade off between the complexity of the graphs that can be built and the ease of predicting actions during parsing. Our results show that a cache transition system can cover almost all AMR graphs with a small cache size, and our end-to-end system achieves competitive results in comparison with other transition-based approaches for AMR parsing.
Unsupervised Alignment of Natural Language Instructions with Video Segments
Naim, Iftekhar (University of Rochester) | Song, Young Chol (University of Rochester) | Liu, Qiguang (University of Rochester) | Kautz, Henry (University of Rochester) | Luo, Jiebo (University of Rochester) | Gildea, Daniel (University of Rochester)
We propose an unsupervised learning algorithm for automatically inferring the mappings between English nouns and corresponding video objects. Given a sequence of natural language instructions and an unaligned video recording, we simultaneously align each instruction to its corresponding video segment, and also align nouns in each instruction to their corresponding objects in video. While existing grounded language acquisition algorithms rely on pre-aligned supervised data (each sentence paired with corresponding image frame or video segment), our algorithm aims to automatically infer the alignment from the temporal structure of the video and parallel text instructions. We propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. We evaluate our algorithm on videos of biological experiments performed in wetlabs, and demonstrate its capability of aligning video segments to text instructions and matching video objects to nouns in the absence of any direct supervision.
Integrating Programming by Example and Natural Language Programming
Manshadi, Mehdi H. (University of Rochester) | Gildea, Daniel (Department of Computer Science) | Allen, James F. (University of Rochester)
We motivate the integration of programming by example and natural language programming by developing a system for specifying programs for simple text editing operations based on regular expressions. The programs are described with unconstrained natural language instructions, and providing one or more examples of input/output. We show that natural language allows the system to deduce the correct program much more often and much faster than is possible with the input/output example(s) alone, showing that natural language programming and programming by example can be combined in a way that overcomes the ambiguities that both methods suffer from individually, while providing a more natural interface to the user.
Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
Naim, Iftekhar, Gildea, Daniel
The speed of convergence of the Expectation Maximization (EM) algorithm for Gaussian mixture model fitting is known to be dependent on the amount of overlap among the mixture components. In this paper, we study the impact of mixing coefficients on the convergence of EM. We show that when the mixture components exhibit some overlap, the convergence of EM becomes slower as the dynamic range among the mixing coefficients increases. We propose a deterministic anti-annealing algorithm, that significantly improves the speed of convergence of EM for such mixtures with unbalanced mixing coefficients. The proposed algorithm is compared against other standard optimization techniques like BFGS, Conjugate Gradient, and the traditional EM algorithm. Finally, we propose a similar deterministic anti-annealing based algorithm for the Dirichlet process mixture model and demonstrate its advantages over the conventional variational Bayesian approach.