Machine Translation
Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite
Jwalapuram, Prathyusha, Joty, Shafiq, Temnikova, Irina, Nakov, Preslav
The ongoing neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities such as pronominal anaphora, thus enabling better translations. Unfortunately, even when the resulting improvements are seen as substantial by humans, they remain virtually unnoticed by traditional automatic evaluation measures like BLEU, as only a few words end up being affected. Thus, specialized evaluation measures are needed. With this aim in mind, we contribute an extensive, targeted dataset that can be used as a test suite for pronoun translation, covering multiple source languages and different pronoun errors drawn from real system translations, for English. We further propose an evaluation measure to differentiate good and bad pronoun translations. We also conduct a user study to report correlations with human judgments.
Facebook founds AI Language Research Consortium to solve challenges in natural language processing
Roughly three months ago, Facebook launched calls for research proposals in three subfields of natural language processing (NLP), the cross-disciplinary study of linguistics and AI concerned with computer-language interactions. It specifically sought "robust" deep learning approaches for NLP and computationally efficient NLP in addition to neural machine translation for low-resource dialects, ultimately in the pursuit of advancing cutting-edge research in machine translation. That was just the start, it would seem. In a blog post today announcing 11 winning proposals among the 115 submitted from 35 countries, Facebook announced the AI Language Research Consortium, a community of partners it says will "work together to advance priority research areas" in NLP. Details were tough to come by at press time, but Facebook says the newly formed group will foster collaboration to tackle challenging tasks like representation learning, content understanding, dialog systems, information extraction, sentiment analysis, summarization, data collection and cleaning, and speech translation.
Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel
Tsai, Yao-Hung Hubert, Bai, Shaojie, Yamada, Makoto, Morency, Louis-Philippe, Salakhutdinov, Ruslan
Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the attention mechanism, which concurrently processes all inputs in the streams. In this paper, we present a new formulation of attention via the lens of the kernel. To be more precise, we realize that the attention can be seen as applying kernel smoother over the inputs with the kernel scores being the similarities between inputs. This new formulation gives us a better way to understand individual components of the Transformer's attention, such as the better way to integrate the positional embedding. Another important advantage of our kernel-based formulation is that it paves the way to a larger space of composing Transformer's attention. As an example, we propose a new variant of Transformer's attention which models the input as a product of symmetric kernels. This approach achieves competitive performance to the current state of the art model with less computation. In our experiments, we empirically study different kernel construction strategies on two widely used tasks: neural machine translation and sequence prediction.
Latent Part-of-Speech Sequences for Neural Machine Translation
Yang, Xuewen, Liu, Yingru, Xie, Dongliang, Wang, Xin, Balasubramanian, Niranjan
Learning target side syntactic structure has been shown to improve Neural Machine Translation (NMT). However, incorporating syntax through latent variables introduces additional complexity in inference, as the models need to marginalize over the latent syntactic structures. To avoid this, models often resort to greedy search which only allows them to explore a limited portion of the latent space. In this work, we introduce a new latent variable model, LaSyn, that captures the co-dependence between syntax and semantics, while allowing for effective and efficient inference over the latent space. LaSyn decouples direct dependence between successive latent variables, which allows its decoder to exhaustively search through the latent syntactic choices, while keeping decoding speed proportional to the size of the latent variable vocabulary. We implement LaSyn by modifying a transformer-based NMT system and design a neural expectation maximization algorithm that we regularize with part-of-speech information as the latent sequences. Evaluations on four different MT tasks show that incorporating target side syntax with LaSyn improves both translation quality, and also provides an opportunity to improve diversity.
Real-World Natural Language Processing: applied NLP
Take 42% off by entering slhagiwara into the discount code box at checkout at manning.com. Natural language processing (NLP) is a set of tools and algorithms that help computers extract meaning from text. Turn to the next slide to find out more. 3. Apply NLP in your projects today Real-world Natural Language Processing teaches you how to create practical NLP applications without getting bogged down in complex language theory and the mathematics of deep learning. In it, you'll explore the core tools and techniques required to build a huge range of powerful NLP apps to help computers better understand humans. I saw a girl with a telescopeโฆ How's a computer to know which is right?
HARE: a Flexible Highlighting Annotator for Ranking and Exploration
Newman-Griffis, Denis, Fosler-Lussier, Eric
Exploration and analysis of potential data sources is a significant challenge in the application of NLP techniques to novel information domains. We describe HARE, a system for highlighting relevant information in document collections to support ranking and triage, which provides tools for post-processing and qualitative analysis for model development and tuning. We apply HARE to the use case of narrative descriptions of mobility information in clinical data, and demonstrate its utility in comparing candidate embedding features. We provide a web-based interface for annotation visualization and document ranking, with a modular backend to support interoperability with existing annotation tools. Our system is available online at https://github.com/OSU-slatelab/HARE.
Machine Translation & Text Analytics: Friends or Foes?
Government agencies face similar challenges when it comes to understanding--and gaining intelligence from-- foreign language content. They need to process, manage and gain insight from large volumes of content locked away in different formats, often across multiple languages. And they need to do all of this as quickly as possible. It's no mean feat when you consider the mindboggling amounts of content being generated: 90% of the world's content was created over the past two years alone. Machine translation and text analytics have always been regarded as the two main ways for organizations and agencies to tackle this challenge.
What makes a good conversation?
This blog post is about the NAACL 2019 paper What makes a good conversation? How controllable attributes affect human judgments by Abigail See, Stephen Roller, Douwe Kiela and Jason Weston. On the left are tasks like Machine Translation (MT), which are less open-ended (i.e. Given the close correspondence between input and output, these tasks can be accomplished mostly (but not entirely) by decisions at the word/phrase level. On the right are tasks like Story Generation and Chitchat Dialogue, which are more open-ended (i.e. For these tasks, the ability to make high-level decisions (e.g.
DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks
Yavuz, Semih, Rastogi, Abhinav, Chao, Guan-Lin, Hakkani-Tur, Dilek
Recent advances in neural sequence-to-sequence models have led to promising results for several language generation-based tasks, including dialogue response generation, summarization, and machine translation. However, these models are known to have several problems, especially in the context of chit-chat based dialogue systems: they tend to generate short and dull responses that are often too generic. Furthermore, these models do not ground conversational responses on knowledge and facts, resulting in turns that are not accurate, informative and engaging for the users. In this paper, we propose and experiment with a series of response generation models that aim to serve in the general scenario where in addition to the dialogue context, relevant unstructured external knowledge in the form of text is also assumed to be available for models to harness. Our proposed approach extends pointer-generator networks (See et al., 2017) by allowing the decoder to hierarchically attend and copy from external knowledge in addition to the dialogue context. We empirically show the effectiveness of the proposed model compared to several baselines including (Ghazvininejad et al., 2018; Zhang et al., 2018) through both automatic evaluation metrics and human evaluation on CONVAI2 dataset.
On Education Deep Learning: Advanced NLP and RNNs - all courses
Build a text classification system (can be used for spam detection, sentiment analysis, and similar problems) Build a neural machine translation system (can also be used for chatbots and question answering) Build a sequence-to-sequence (seq2seq) model Build an attention model Build a memory network (for question answering based on stories) Understand what deep learning is for and how it is used Decent Python coding skills, especially tools for data science (Numpy, Matplotlib) Preferable to have experience with RNNs, LSTMs, and GRUs Preferable to have experience with Keras Preferable to understand word embeddings It's hard to believe it's been been over a year since I released my first course on Deep Learning with NLP (natural language processing). A lot of cool stuff has happened since then, and I've been deep in the trenches learning, researching, and accumulating the best and most useful ideas to bring them back to you. So what is this course all about, and how have things changed since then? In previous courses, you learned about some of the fundamental building blocks of Deep NLP. We looked at RNNs (recurrent neural networks), CNNs (convolutional neural networks), and word embedding algorithms such as word2vec and GloVe.