AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Evaluating Sequence-to-Sequence Learning Models for If-Then Program Synthesis

Dalal, Dhairya, Galbraith, Byron V.

arXiv.org Machine LearningFeb-9-2020

Implementing enterprise process automation often requires significant technical expertise and engineering effort. It would be beneficial for non-technical users to be able to describe a business process in natural language and have an intelligent system generate the workflow that can be automatically executed. A building block of process automations are If-Then programs. In the consumer space, sites like IFTTT and Zapier allow users to create automations by defining If-Then programs using a graphical interface. We explore the efficacy of modeling If-Then programs as a sequence learning task. We find Seq2Seq approaches have high potential (performing strongly on the Zapier recipes) and can serve as a promising approach to more complex program synthesis challenges.

dataset, recipe, sequence, (15 more...)

arXiv.org Machine Learning

2002.03485

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Europe > Germany > Berlin (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.84)

Industry: Information Technology (0.43)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Time-aware Large Kernel Convolutions

Lioutas, Vasileios, Guo, Yuhong

arXiv.org Machine LearningFeb-8-2020

To date, most state-of-the-art sequence modelling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of $O(n^2)$. Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size $k$ acting as a limited-window self-attention, resulting in time complexity of $O(k{\cdot}n)$. In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using the fixed-sized kernel matrix. This method yields a time complexity of $O(n)$, effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation and language modelling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.

convolution, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2002.03184

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CCMatrix: A billion-scale bitext data set for training translation models

#artificialintelligenceFeb-7-2020, 18:41:06 GMT

CCMatrix is the largest data set of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pulled from snapshots of the CommonCrawl public data set, CCMatrix is more than 50 times larger than the WikiMatrix corpus that we shared last year. Gathering a data set of this size required modifying our previous bitext mining approach used for WikiMatrix, assuming that the translation of one sentence could be found anywhere on CommonCrawl, which functions as an open archive of the internet. To address the significant computational challenges posed by comparing billions of sentences to determine which ones are mutual translations, we used massively parallel processing, as well as our highly efficient FAISS library for fast similarity searches. We're sharing details about how we created CCMatrix, and the tools needed for other researchers to reproduce our results and use this corpus for their work.

billion-scale bitext data, ccmatrix, training translation model, (7 more...)

#artificialintelligence

Genre: Research Report (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Communications > Social Media (0.87)

Add feedback

Translating Web Search Queries into Natural Language Questions

Kumar, Adarsh, Dandapat, Sandipan, Chordia, Sushil

arXiv.org Artificial IntelligenceFeb-7-2020

Users often query a search engine with a specific question in mind and often these queries are keywords or sub-sentential fragments. For example, if the users want to know the answer for "What's the capital of USA", they will most probably query "capital of USA" or "USA capital" or some keyword-based variation of this. For example, for the user entered query "capital of USA", the most probable question intent is "What's the capital of USA?". In this paper, we are proposing a method to generate well-formed natural language question from a given keyword-based query, which has the same question intent as the query. Conversion of keyword-based web query into a well-formed question has lots of applications, with some of them being in search engines, Community Question Answering (CQA) website and bots communication. We found a synergy between query-to-question problem with standard machine translation(MT) task. We have used both Statistical MT (SMT) and Neural MT (NMT) models to generate the questions from the query. We have observed that MT models perform well in terms of both automatic and human evaluation.

query, question intent, search engine, (13 more...)

arXiv.org Artificial Intelligence

2002.02631

Country:

North America > United States > Kansas (0.05)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Welleck, Sean, Kulikov, Ilia, Kim, Jaedeok, Pang, Richard Yuanzhe, Cho, Kyunghyun

arXiv.org Machine LearningFeb-6-2020

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

algorithm, language model, sequence, (13 more...)

arXiv.org Machine Learning

2002.02492

Country:

North America > United States > Texas (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Translate this: How real-time translation breaks down barriers when you don't speak the language

USATODAY - Tech Top StoriesFeb-5-2020, 12:50:55 GMT

In the sci-fi world crafted by Douglas Adams in "The Hitchhiker's Guide to the Galaxy," you'd just slap a bright yellow Babel fish in your ear and simply be able to understand any mix of languages around you. While we aren't quite there yet, language is becoming less of a barrier than in generations past. "Understanding is going to become the new normal," says Dave Limp, Amazon's senior vice president of devices and services. Kids "will never grow up in world where they aren't able to hear any language. To that end, today's technology is helping to interpret and translate the world around us in ways that are nearing seamless and in real time. From apps on your phone to increasingly multilingual virtual personal assistants, communicating as a tourist or with clients, friends and family who don't speak the same language is less of a challenge. Yet for all the authentique gains achieved in translation over the past several years, don't count on your phone, smart speaker, PC or ear device ...

artificial intelligence, natural language, social media, (15 more...)

USATODAY - Tech Top Stories

Country:

Asia > China (0.48)
North America > United States > California > San Francisco County > San Francisco (0.15)

Industry:

Health & Medicine (0.74)
Information Technology (0.70)
Government (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Smart Language Translation Solutions and Software for Enterprise - Lingmo International

#artificialintelligenceFeb-4-2020, 04:22:23 GMT

We understand when the language barrier is removed it is easier to communicate with your foreign speaking consumers. We can help you speak to your customers in 80 languages and scale into new international markets with our smart translation solutions.

language translation solution and software, smart language translation solution

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)

Add feedback

Neural Machine Translation System of Indic Languages -- An Attention based Approach

Shah, Parth, Bakrola, Vishvajit

arXiv.org Machine LearningFeb-2-2020

Neural machine translation (NMT) is a recent and effective technique which led to remarkable improvements in comparison of conventional machine translation techniques. Proposed neural machine translation model developed for the Gujarati language contains encoder-decoder with attention mechanism. In India, almost all the languages are originated from their ancestral language - Sanskrit. They are having inevitable similarities including lexical and named entity similarity. Translating into Indic languages is always be a challenging task. In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati that together covers more than 58.49 percentage of total speakers in the country. We have compared the performance of our NMT model with automatic evaluation matrices such as BLEU, perplexity and TER matrix. The comparison of our network with Google translate is also presented where it outperformed with a margin of 6 BLEU score on English-Gujarati translation.

machine translation, translation, translation system, (13 more...)

arXiv.org Machine Learning

doi: 10.1109/ICACCP.2019.8882969

2002.02758

Country:

Asia > India (0.27)
Asia > Singapore (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker

Kelkar, Amol, Relan, Rohan, Bhardwaj, Vaishali, Vaichal, Saurabh, Relan, Peter

arXiv.org Machine LearningFeb-2-2020

To access data stored in relational databases, users need to understand the database schema and write a query using a query language such as SQL. To simplify this task, text-to-SQL models attempt to translate a user's natural language question to corresponding SQL query. Recently, several generative text-to-SQL models have been developed. We propose a novel discriminative re-ranker to improve the performance of generative text-to-SQL models by extracting the best SQL query from the beam output predicted by the text-to-SQL generator, resulting in improved performance in the cases where the best query was in the candidate list, but not at the top of the list. We build the re-ranker as a schema agnostic BERT fine-tuned classifier. We analyze relative strengths of the text-to-SQL and re-ranker models across different query hardness levels, and suggest how to combine the two models for optimal performance. We demonstrate the effectiveness of the re-ranker by applying it to two state-of-the-art text-to-SQL models, and achieve top 4 score on the Spider leaderboard at the time of writing this article.

bertrand-dr, query, text-to-sql model, (14 more...)

arXiv.org Machine Learning

2002.00557

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Word Sense Disambiguation

#artificialintelligenceFeb-1-2020, 17:36:13 GMT

The history and development of Artificial Intelligence has seen numerous peaks and troughs. Hype around what machines can accomplish lead to boosts in AI funding while unmet expectations cripple the industry until the next breakthrough. The term AI Winter refers to periods in history of reduced funding and interest in artificial intelligence development. During the cold war, there was an increased interest in Machine Translation to automate the translation of Russian documents into English. This time period also coincided with massive strides in linguistic developments and the early career of the famed linguist Noam Chomsky.

machine translation, translation, word sense disambiguation, (6 more...)

#artificialintelligence

Industry: Government > Military (0.37)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback