AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Guo, Junliang, Tan, Xu, Xu, Linli, Qin, Tao, Chen, Enhong, Liu, Tie-Yan

arXiv.org Machine LearningNov-21-2019

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than $1$ BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than $10$ times) the inference process over AT baselines.

curriculum, decoder input, translation, (13 more...)

arXiv.org Machine Learning

1911.08717

Country: Asia > China > Anhui Province (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

What Do You Mean `Why?': Resolving Sluices in Conversations

Hansen, Victor Petrén Bach, Søgaard, Anders

arXiv.org Artificial IntelligenceNov-21-2019

What Do Y ou Mean'Why?': Resolving Sluices in Conversations Victor Petr en Bach Hansen, 1 2 Anders Søgaard 1 3 1 Department of Computer Science, University of Copenhagen, Denmark 2 Topdanmark A/S, Denmark 3 Google Research, Berlin victor.petren@di.ku.dk, soegaard@di.ku.dk Abstract In conversation, we often ask one-word questions such as'Why?' or'Who?'. Such questions are typically easy for humans to answer, but can be hard for computers, because their resolution requires retrieving both the right semantic frames and the right arguments from context. This paper introduces the novel ellipsis resolution task of resolving such one-word questions, referred to as sluices in linguistics. We present a crowd-sourced dataset containing annotations of sluices from over 4,000 dialogues collected from conversational QA datasets, as well as a series of strong baseline architectures. 1 Introduction Stand-alone wh-word questions, such as When? in Figure 1, are easy for us to understand, but in order to interpret them we need to retrieve implicit information from context. Learning to do so is an instance of sluicing, an ellipsis phenomenon, defined by Ross (1969) as'the effect of deleting everything but the preposed constituent of an embedded question, under the condition that the remainder of the question is identical to some other part of the sentence, or a preceding sentence.' In the context of conversations, one-word wh-word questions are particularly frequent (Anand and Hardt 2016; Rønning, Hardt, and Søgaard 2018), and because they are often hard to resolve, they seem to be a frequent source of error in conversational question answering (Choi et al. 2018; Reddy, Chen, and Manning 2018) and dialogue understanding (Vlachos and Clark 2014). We refer to this type of sluicing as conversational sluicing . Unlike previous work where sluice resolution is treated as predicting the span of the antecedent (Anand and Hardt 2016; Rønning, Hardt, and Søgaard 2018), we frame conversational sluice resolution as a Natural Language Generation (NLG) task, in which we seek to automatically generate the full question, given a question-answer context and a one-word question. Q 1: Where was the bombing?

dataset, resolution, sluice, (16 more...)

arXiv.org Artificial Intelligence

1911.09478

Country:

Europe > Denmark > Capital Region > Copenhagen (0.24)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

Visualisation of embedding relations (Word2Vec, BERT)

#artificialintelligenceNov-19-2019, 09:58:24 GMT

In this story, we will visualise the word embedding vectors to understand the relations between words described by the embeddings. This story focuses on word2vec [1] and BERT [2]. To understand the embeddings, I suggest reading a different introduction (like this) as this story does not aim to describe them. This story is part of my journey to develop Neural Machine Translation (NMT) using BERT contextualised embedding vectors. Word embeddings are models to generate computer-friendly numeric vector representations for words.

matrix, projection, vector, (15 more...)

#artificialintelligence

Country: North America > Mexico (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

Add feedback

Graph Transformer for Graph-to-Sequence Learning

Cai, Deng, Lam, Wai

arXiv.org Artificial IntelligenceNov-18-2019

The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict the information exchange between immediate neighborhood, we propose a new model, known as Graph Transformer, that uses explicit relation encoding and allows direct communication between two distant nodes. It provides a more efficient way for global graph structure modeling. Experiments on the applications of text generation from Abstract Meaning Representation (AMR) and syntax-based neural machine translation show the superiority of our proposed model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art results by up to 2.2 points. On the syntax-based translation tasks, our model establishes new single-model state-of-the-art BLEU scores, 21.3 for English-to-German and 14.1 for English-to-Czech, improving over the existing best results, including ensembles, by over 1 BLEU.

graph, node, representation, (15 more...)

arXiv.org Artificial Intelligence

1911.0747

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Understanding and Improving Layer Normalization

Xu, Jingjing, Sun, Xu, Zhang, Zhiyuan, Zhao, Guangxiang, Lin, Junyang

arXiv.org Machine LearningNov-16-2019

Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.

derivative, layernorm, normalization, (15 more...)

arXiv.org Machine Learning

1911.07013

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

DataCareer: Your Career Platform for Data Science in the UK and Ireland

#artificialintelligenceNov-15-2019, 13:14:37 GMT

Grade: G13/3 (net (basic) monthly salary* for this vacancy: EUR 12 435,12, which may be supplemented by various allowances depending on your personal circumstances) Duration of appointment: 5 years Career path: Managerial Location: Munich Application deadline: 17.11.2019 With almost 7 000 employees, the European Patent Office (EPO) is the second-largest public service institution in Europe. It supports innovation, competitiveness and economic growth across Europe through a commitment to high-quality and efficient services delivered under the European Patent Convention, its founding treaty. It has a yearly budget of EUR 2.3 billion, entirely financed by the fees paid by its users. As set out in its Strategic Plan 2023, the EPO is proud to deliver high-quality patents and efficient services that foster innovation, competitiveness and economic growth.

data protection guideline, epo, protection, (9 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.40)
Europe > Ireland (0.40)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.26)
(2 more...)

Industry:

Banking & Finance (0.58)
Information Technology > Security & Privacy (0.38)
Law > Intellectual Property & Technology Law (0.32)

Technology:

Information Technology > Data Science (0.40)
Information Technology > Security & Privacy (0.38)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.31)

Add feedback

Embedding Projection for Targeted Cross-lingual Sentiment: Model Comparisons and a Real-World Study

Barnes, Jeremy (University of Oslo) | Klinger, Roman

Journal of Artificial Intelligence ResearchNov-15-2019

Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast arrayof these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, byjointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targetedsentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of a annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages tothose which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.

computational linguistic, proceedings, sentiment analysis, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11561

AI Access Foundation

11561

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Europe > Norway > Eastern Norway > Oslo (0.05)
Europe > Norway > Eastern Norway > Akershus (0.04)
(34 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (0.68)
Leisure & Entertainment (0.67)
Government > Voting & Elections (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Legal translation tool launching for French

#artificialintelligenceNov-13-2019, 18:01:47 GMT

In addition to being designed particularly for the French markets of Canada, the company is trying to lure customers with enterprise-centred options such as customization, review by human translators, and cybersecurity. Kalaci says the technology, which is not affiliated with Amazon's Alexa, is hosted on Canadian servers and the text is destroyed once it is translated. There is also an option for firms to use their data to train a customised tool. Either way, he says, is an improvement over free services offered on the web. "Most web-based tools you use, have a disclosure wherein they say, 'Any content you put in here, we keep.' And that's how they keep improving their tools," says Kalaci.

kalaci, legal translation tool

#artificialintelligence

Country: North America > Canada (0.29)

Industry:

Law (0.85)
Information Technology (0.75)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)

Add feedback

Legal translation tool launching for French

#artificialintelligenceNov-13-2019, 18:01:46 GMT

kalaci, legal translation tool

#artificialintelligence

Country: North America > Canada (0.29)

Industry:

Law (0.85)
Information Technology (0.75)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)

Add feedback

Human-centric Metric for Accelerating Pathology Reports Annotation

Ma, Ruibin, Chen, Po-Hsuan Cameron, Li, Gang, Weng, Wei-Hung, Lin, Angela, Gadepalli, Krishna, Cai, Yuannan

arXiv.org Machine LearningNov-12-2019

Pathology reports contain useful information such as the main involved organ, diagnosis, etc. These information can be identified from the free text reports and used for large-scale statistical analysis or serve as annotation for other modalities such as pathology slides images. However, manual classification for a huge number of reports on multiple tasks is labor-intensive. In this paper, we have developed an automatic text classifier based on BERT and we propose a human-centric metric to evaluate the model. According to the model confidence, we identify low-confidence cases that require further expert annotation and high-confidence cases that are automatically classified. We report the percentage of low-confidence cases and the performance of automatically classified cases. On the high-confidence cases, the model achieves classification accuracy comparable to pathologists. This leads a potential of reducing 80% to 98% of the manual annotation workload.

annotation, bert encoder, classification, (12 more...)

arXiv.org Machine Learning

1911.01226

Country: North America > United States (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback