Goto

Collaborating Authors

 Machine Translation


Graph Transformer for Graph-to-Sequence Learning

arXiv.org Artificial Intelligence

The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict the information exchange between immediate neighborhood, we propose a new model, known as Graph Transformer, that uses explicit relation encoding and allows direct communication between two distant nodes. It provides a more efficient way for global graph structure modeling. Experiments on the applications of text generation from Abstract Meaning Representation (AMR) and syntax-based neural machine translation show the superiority of our proposed model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art results by up to 2.2 points. On the syntax-based translation tasks, our model establishes new single-model state-of-the-art BLEU scores, 21.3 for English-to-German and 14.1 for English-to-Czech, improving over the existing best results, including ensembles, by over 1 BLEU.


Understanding and Improving Layer Normalization

arXiv.org Machine Learning

Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.


DataCareer: Your Career Platform for Data Science in the UK and Ireland

#artificialintelligence

Grade: G13/3 (net (basic) monthly salary* for this vacancy: EUR 12 435,12, which may be supplemented by various allowances depending on your personal circumstances) Duration of appointment: 5 years Career path: Managerial Location: Munich Application deadline: 17.11.2019 With almost 7 000 employees, the European Patent Office (EPO) is the second-largest public service institution in Europe. It supports innovation, competitiveness and economic growth across Europe through a commitment to high-quality and efficient services delivered under the European Patent Convention, its founding treaty. It has a yearly budget of EUR 2.3 billion, entirely financed by the fees paid by its users. As set out in its Strategic Plan 2023, the EPO is proud to deliver high-quality patents and efficient services that foster innovation, competitiveness and economic growth.


Embedding Projection for Targeted Cross-lingual Sentiment: Model Comparisons and a Real-World Study

Journal of Artificial Intelligence Research

Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast arrayof these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, byjointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targetedsentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of a annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages tothose which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.


Legal translation tool launching for French

#artificialintelligence

In addition to being designed particularly for the French markets of Canada, the company is trying to lure customers with enterprise-centred options such as customization, review by human translators, and cybersecurity. Kalaci says the technology, which is not affiliated with Amazon's Alexa, is hosted on Canadian servers and the text is destroyed once it is translated. There is also an option for firms to use their data to train a customised tool. Either way, he says, is an improvement over free services offered on the web. "Most web-based tools you use, have a disclosure wherein they say, 'Any content you put in here, we keep.' And that's how they keep improving their tools," says Kalaci.


Legal translation tool launching for French

#artificialintelligence

In addition to being designed particularly for the French markets of Canada, the company is trying to lure customers with enterprise-centred options such as customization, review by human translators, and cybersecurity. Kalaci says the technology, which is not affiliated with Amazon's Alexa, is hosted on Canadian servers and the text is destroyed once it is translated. There is also an option for firms to use their data to train a customised tool. Either way, he says, is an improvement over free services offered on the web. "Most web-based tools you use, have a disclosure wherein they say, 'Any content you put in here, we keep.' And that's how they keep improving their tools," says Kalaci.


Human-centric Metric for Accelerating Pathology Reports Annotation

arXiv.org Machine Learning

Pathology reports contain useful information such as the main involved organ, diagnosis, etc. These information can be identified from the free text reports and used for large-scale statistical analysis or serve as annotation for other modalities such as pathology slides images. However, manual classification for a huge number of reports on multiple tasks is labor-intensive. In this paper, we have developed an automatic text classifier based on BERT and we propose a human-centric metric to evaluate the model. According to the model confidence, we identify low-confidence cases that require further expert annotation and high-confidence cases that are automatically classified. We report the percentage of low-confidence cases and the performance of automatically classified cases. On the high-confidence cases, the model achieves classification accuracy comparable to pathologists. This leads a potential of reducing 80% to 98% of the manual annotation workload.


Improving Robustness of Task Oriented Dialog Systems

arXiv.org Artificial Intelligence

Task oriented language understanding in dialog systems is often modeled using intents (task of a query) and slots (parameters for that task). Intent detection and slot tagging are, in turn, modeled using sentence classification and word tagging techniques respectively. Similar to adversarial attack problems with computer vision models discussed in existing literature, these intent-slot tagging models are often over-sensitive to small variations in input -- predicting different and often incorrect labels when small changes are made to a query, thus reducing their accuracy and reliability. However, evaluating a model's robustness to these changes is harder for language since words are discrete and an automated change (e.g. adding `noise') to a query sometimes changes the meaning and thus labels of a query. In this paper, we first describe how to create an adversarial test set to measure the robustness of these models. Furthermore, we introduce and adapt adversarial training methods as well as data augmentation using back-translation to mitigate these issues. Our experiments show that both techniques improve the robustness of the system substantially and can be combined to yield the best results.


Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

arXiv.org Machine Learning

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder based on their relationships to all tokens in a sequence. Recent studies have shown that although such models are capable of learning syntactic features purely by seeing examples, explicitly feeding this information to deep learning models can significantly enhance their performance. Leveraging syntactic information like part of speech (POS) may be particularly beneficial in limited training data settings for complex models such as the Transformer. We show that the syntax-infused Transformer with multiple features achieves an improvement of 0.7 BLEU when trained on the full WMT '14 English to German translation dataset and a maximum improvement of 1.99 BLEU points when trained on a fraction of the dataset. In addition, we find that the incorporation of syntax into BERT fine-tuning outperforms baseline on a number of downstream tasks from the GLUE benchmark. Introduction Attention-based deep learning models for natural language processing (NLP) have shown promise for a variety of machine translation and natural language understanding tasks. For word-level, sequence-to-sequence tasks such as translation, paraphrasing, and text summarization, attention-based models allow a single token ( e.g., a word or subword) in a sequence to be represented as a combination of all tokens in the sequence (Luong, Pham, and Manning, 2015). The distributed context allows attention-based models to infer rich representations for tokens, leading to more robust performance.


A Massive Collection of Cross-Lingual Web-Document Pairs

arXiv.org Machine Learning

Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Small-scale efforts have been made to collect aligned document level data on a limited set of language-pairs such as English-German or on limited comparable collections such as Wikipedia. In this paper, we mine twelve snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other. We release a new web dataset consisting of 54 million URL pairs from Common Crawl covering documents in 92 languages paired with English. We evaluate the quality of the dataset by measuring the quality of machine translations from models that have been trained on mined parallel sentence pairs from this aligned corpora and introduce a simple yet effective baseline for identifying these aligned documents. The objective of this dataset and paper is to foster new research in cross-lingual NLP across a variety of low, mid, and high-resource languages.