Machine Translation
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
Li, Dongxu, Xu, Chenchen, Yu, Xin, Zhang, Kaihao, Swift, Ben, Suominen, Hanna, Li, Hongdong
Sign language translation (SLT) aims to interpret sign video sequences into textbased natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96) on the largest commonly-used SLT dataset.
It's not a Non-Issue: Negation as a Source of Error in Machine Translation
Hossain, Md Mosharaf, Anastasopoulos, Antonios, Blanco, Eduardo, Palmer, Alexis
As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger. In this study we focus on negation, a universal, core property of human language that significantly affects the semantics of an utterance. We investigate whether translating negation is an issue for modern MT systems using 17 translation directions as test bed. Through thorough analysis, we find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%. We also provide a linguistically motivated analysis that directly explains the majority of our findings. We release our annotations and code to replicate our analysis here: https://github.com/mosharafhossain/negation-mt.
Lexically Cohesive Neural Machine Translation with Copy Mechanism
Mishra, Vipul, Chu, Chenhui, Arase, Yuki
Lexically cohesive translations preserve consistency in word choices in document-level translation. We employ a copy mechanism into a context-aware neural machine translation model to allow copying words from previous translation outputs. Different from previous context-aware neural machine translation models that handle all the discourse phenomena implicitly, our model explicitly addresses the lexical cohesion problem by boosting the probabilities to output words consistently. We conduct experiments on Japanese to English translation using an evaluation dataset for discourse translation. The results showed that the proposed model significantly improved lexical cohesion compared to previous context-aware models.
On the Computational Power of Transformers and its Implications in Sequence Modeling
Bhattamishra, Satwik, Patel, Arkil, Goyal, Navin
Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate and simpler proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with only positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary. We demonstrate the practical implications of our results via experiments on machine translation and synthetic tasks.
Cue-word Driven Neural Response Generation with a Shrinking Vocabulary
Wang, Qiansheng, Liu, Yuxin, Lv, Chengguo, Wang, Zhen, Fu, Guohong
Open-domain response generation is the task of generating sensible and informative re-sponses to the source sentence. However, neural models tend to generate safe and mean-ingless responses. While cue-word introducing approaches encourage responses with concrete semantics and have shown tremendous potential, they still fail to explore di-verse responses during decoding. In this paper, we propose a novel but natural approach that can produce multiple cue-words during decoding, and then uses the produced cue-words to drive decoding and shrinks the decoding vocabulary. Thus the neural genera-tion model can explore the full space of responses and discover informative ones with efficiency. Experimental results show that our approach significantly outperforms several strong baseline models with much lower decoding complexity. Especially, our approach can converge to concrete semantics more efficiently during decoding.
On Long-Tailed Phenomena in Neural Machine Translation
Raunak, Vikas, Dalmia, Siddharth, Gupta, Vivek, Metze, Florian
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.
ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization
Zhang, Shiyue, Frey, Benjamin, Bansal, Mohit
Cherokee is a highly endangered Native American language spoken by the Cherokee people. The Cherokee culture is deeply embedded in its language. However, there are approximately only 2,000 fluent first language Cherokee speakers remaining in the world, and the number is declining every year. To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English. Compared to some popular machine translation language pairs, ChrEn is extremely low-resource, only containing 14k sentence pairs in total. We split our parallel data in ways that facilitate both in-domain and out-of-domain evaluation. We also collect 5k Cherokee monolingual data to enable semi-supervised learning. Besides these datasets, we propose several Cherokee-English and English-Cherokee machine translation systems. We compare SMT (phrase-based) versus NMT (RNN-based and Transformer-based) systems; supervised versus semi-supervised (via language model, back-translation, and BERT/Multilingual-BERT) methods; as well as transfer learning versus multilingual joint training with 4 other languages. Our best results are 15.8/12.7 BLEU for in-domain and 6.5/5.0 BLEU for out-of-domain Chr-En/EnChr translations, respectively, and we hope that our dataset and systems will encourage future work by the community for Cherokee language revitalization. Our data, code, and demo will be publicly available at https://github.com/ZhangShiyue/ChrEn
How Companies Can Create Responsible and Transparent AI – Thought Leaders
Sundar Pichai, CEO of Google parent company Alphabet, has described developments in AI as "more profound than fire or electricity," and COVID-19 has brought fresh urgency in unleashing this technology's promise. Applications of AI are now firmly in the spotlight, improving COVID treatments, tracing potential COVID carriers, and deploying real-time chatbots for supply-stricken users of retail websites. These applications have shown that AI improves a business's resilience and benefits broader society. So along with "cloud-native," the buzzword of the last quarter might just be "AI-first transformation," a term that industry practitioners believe will hold true even after COVID goes away. For many firms, the promise of lower costs (i.e., supply chain algorithms that match supply with demand) and admirable boosts in productivity (i.e., when banks use document and identity verification in real time) is just too good to ignore. In AI-first transformation, an enterprise uses AI as a North Star, working to use it not only intelligently but also in a way that influences decisions made by people, processes, and systems at scale.
Query-Key Normalization for Transformers
Henry, Alex, Dachapally, Prudhvi Raj, Pawar, Shubham, Chen, Yuxuan
Low-resource language translation is a challenging but socially valuable NLP task. Building on recent work adapting the Transformer's normalization to this setting, we propose QKNorm, a normalization technique that modifies the attention mechanism to make the softmax function less prone to arbitrary saturation without sacrificing expressivity. Specifically, we apply $\ell_2$ normalization along the head dimension of each query and key matrix prior to multiplying them and then scale up by a learnable parameter instead of dividing by the square root of the embedding dimension. We show improvements averaging 0.928 BLEU over state-of-the-art bilingual benchmarks for 5 low-resource translation pairs from the TED Talks corpus and IWSLT'15.
Machine Translation for Manufacturing: A Case Study at Ford Motor Company
Machine translation (MT) was one of the first applications of artificial intelligence technology that was deployed to solve real-world problems. Since the early 1960s, researchers have been building and utilizing computer systems that can translate from one language to another without requiring extensive human intervention. In the late 1990s, Ford Vehicle Operations began working with Systran Software Inc. to adapt and customize its machine-translation technology in order to translate Ford's vehicle assembly build instructions from English to German, Spanish, Dutch, and Portuguese. The use of machine translation was made necessary by the vast amount of dynamic information that needed to be translated in a timely fashion. The assembly build instructions at Ford contain text written in a controlled language as well as unstructured remarks and comments.