The field of machine translation (MT), the automatic translation of written text from one natural language into another, has experienced a major paradigm shift in recent years. Statistical MT, which mainly relies on various count-based models and which used to dominate MT research for decades, has largely been superseded by neural machine translation (NMT), which tackles translation with a single neural network. In this work we will trace back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family. We will conclude with a short survey of more recent trends in the field.
Machines show an increasingly broad set of linguistic competencies, thanks to recent progress in Natural Language Processing (NLP). Many algorithms stem from past computational work in psychology, raising the question of whether they understand words as people do. In this paper, we compare how humans and machines represent the meaning of words. We argue that contemporary NLP systems are promising models of human word similarity, but they fall short in many other respects. Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people use words in order to express. Word meanings must also be grounded in vision and action, and capable of flexible combinations, in ways that current systems are not. We pose concrete challenges for developing machines with a more human-like, conceptual basis for word meaning. We also discuss implications for cognitive science and NLP.
A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions. In this paper, we take unit conversions as an example and propose a data augmentation technique which leads to models learning both translation and conversion tasks as well as how to adequately switch between them for end-to-end localization.
New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.
We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues and challenges. Wherever possible we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions that MNMT research might take. This paper is aimed towards both, beginners and experts in NMT. We hope this paper will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.
We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemma-tised and PoS-tagged form. We provided four subtasks, for the English-French and English-German language pairs, in both directions. Eleven teams participated in the shared task; nine for the English-French subtask, five for French-English, nine for English-German, and six for German-English. Most of the submissions outperformed two strong language-model- based baseline systems, with systems using deep recurrent neural networks outperforming those using other architectures for most language pairs.
We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosynthesis route planning without human intervention. The single-step retrosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture. The hypergraph is constructed on the fly, and the nodes are filtered and further expanded based on a Bayesian-like probability. We critically assessed the end-to-end framework with several retrosynthesis examples from literature and academic exams. Overall, the frameworks has a very good performance with few weaknesses due to the bias induced during the training process. The use of the newly introduced metrics opens up the possibility to optimize entire retrosynthetic frameworks through focusing on the performance of the single-step model only.
An increasingly popular approach to alleviate this issue is to first learn general language representations on unlabeled data, which are then integrated in task-specific downstream systems. This approach was first popularized by word embeddings (Mikolov et al., 2013b; This work was performed during an internship at Facebook AI Research. Pennington et al., 2014), but has recently been superseded by sentence-level representations (Peters et al., 2018; Devlin et al., 2019). Nevertheless, all these works learn a separate model for each language and are thus unable to leverage information across different languages, greatly limiting their potential performance for low-resource languages. In this work, we are interested in universal language agnostic sentence embeddings, that is, vector representations of sentences that are general with respect to two dimensions: the input language and the NLP task.
Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast arrayof these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, byjointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targetedsentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of a annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages tothose which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.