AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Amortized Noisy Channel Neural Machine Translation

Pang, Richard Yuanzhe, He, He, Cho, Kyunghyun

arXiv.org Artificial IntelligenceJul-18-2022

Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, one-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1-2 orders of magnitude.

beam search, trajectory, translation, (14 more...)

arXiv.org Artificial Intelligence

2112.0867

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

The best Machine Learning Translation Tool?

#artificialintelligenceJul-17-2022, 02:25:22 GMT

There are several possibilities when one wants to quickly translate something into another language. The Google Translator is possibly the most famous solution herefore. But besides that there are some good alternatives like DeepL, which according to some sources is supposed to be the best translator online [1][2][3].

artificial intelligence, machine translation, natural language, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Bugliarello, Emanuele, Liu, Fangyu, Pfeiffer, Jonas, Reddy, Siva, Elliott, Desmond, Ponti, Edoardo Maria, Vulić, Ivan

arXiv.org Artificial IntelligenceJul-17-2022

Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.

computational linguistic, dataset, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2201.11732

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(24 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.70)

Add feedback

Behind No language Left Behind

#artificialintelligenceJul-16-2022, 07:30:46 GMT

What if you didn't need English to translate? Meta's new and improved open source AI model'NLLB-200' is capable of translating 200 languages without English! "Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it's improving everything we do--from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone", says Mark Zuckerberg, CEO, Meta. Accessibility through language ensures that the benefits of the advancement of technology reach everyone, no matter what language they may speak. Tech companies are assuming a proactive role in attempting to bridge this gap.

artificial intelligence, natural language, translation, (18 more...)

#artificialintelligence

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Multilingual Event Linking to Wikidata

Pratapa, Adithya, Gupta, Rishubh, Mitamura, Teruko

arXiv.org Artificial IntelligenceJul-16-2022

We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.

computational linguistic, dataset, wikidata, (14 more...)

arXiv.org Artificial Intelligence

2204.06535

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(40 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Olympic Games (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

Yang, Jian, Yin, Yuwei, Ma, Shuming, Zhang, Dongdong, Li, Zhoujun, Wei, Furu

arXiv.org Artificial IntelligenceJul-15-2022

Multilingual neural machine translation (MNMT) trained in multiple language pairs has attracted considerable attention due to fewer model parameters and lower training costs by sharing knowledge among multiple languages. Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages. In this paper, we propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference, which adopts the two-stage training with the language-specific selection mechanism. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder to enhance the translation quality of high-resource directions. Next, the model is further trained on all available corpora to transfer knowledge from high-resource languages (HRLs) to low-resource languages (LRLs). Experimental results show that HLT-MT outperforms various strong baselines on WMT-10 and OPUS-100 benchmarks. Furthermore, the analytic experiments validate the effectiveness of our method in mitigating the negative interference in multilingual training.

low-resource language, machine translation, translation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2022/619

2207.04906

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Boosting Span-based Joint Entity and Relation Extraction via Squence Tagging Mechanism

Ji, Bin, Li, Shasha, Yu, Jie, Ma, Jun, Liu, Huijun

arXiv.org Artificial IntelligenceJul-15-2022

Span-based joint extraction simultaneously conducts named entity recognition (NER) and relation extraction (RE) in text span form. Recent studies have shown that token labels can convey crucial task-specific information and enrich token semantics. However, as far as we know, due to completely abstain from sequence tagging mechanism, all prior span-based work fails to use token label in-formation. To solve this problem, we pro-pose Sequence Tagging enhanced Span-based Network (STSN), a span-based joint extrac-tion network that is enhanced by token BIO label information derived from sequence tag-ging based NER. By stacking multiple atten-tion layers in depth, we design a deep neu-ral architecture to build STSN, and each atten-tion layer consists of three basic attention units. The deep neural architecture first learns seman-tic representations for token labels and span-based joint extraction, and then constructs in-formation interactions between them, which also realizes bidirectional information interac-tions between span-based NER and RE. Fur-thermore, we extend the BIO tagging scheme to make STSN can extract overlapping en-tity. Experiments on three benchmark datasets show that our model consistently outperforms previous optimal models by a large margin, creating new state-of-the-art results.

computational linguistic, representation, semantic representation, (14 more...)

arXiv.org Artificial Intelligence

2105.1008

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation

Liu, Danni, Wang, Changhan, Gong, Hongyu, Ma, Xutai, Tang, Yun, Pino, Juan

arXiv.org Artificial IntelligenceJul-15-2022

Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge of delivering S2ST in real time is the accumulated delay between the translation and speech synthesis modules. While recently incremental text-to-speech (iTTS) models have shown large quality improvements, they typically require additional future text inputs to reach optimal performance. In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer. After mitigating the initial delay, we demonstrate that the duration of synthesized speech also plays a crucial role on latency. We formalize this as a latency metric and then present a simple yet effective duration-scaling approach for latency reduction. Our approaches consistently reduce latency by 0.2-0.5 second without sacrificing speech translation quality.

latency, lookahead, pseudo lookahead, (17 more...)

arXiv.org Artificial Intelligence

2110.08214

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

No Language Left Behind

#artificialintelligenceJul-14-2022, 16:50:17 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The limits of my language mean the limits of my world.

artificial intelligence, machine translation, natural language, (15 more...)

#artificialintelligence

Country:

Europe (0.05)
South America (0.05)
North America > United States (0.05)
(2 more...)

Industry: Government (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases

Lagzdiņš, Andis, Siliņš, Uldis, Pinnis, Mārcis, Bergmanis, Toms, Vasiļevskis, Artūrs, Vasiļjevs, Andrejs

arXiv.org Artificial IntelligenceJul-14-2022

Consolidated access to current and reliable terms from different subject fields and languages is necessary for content creators and translators. Terminology is also needed in AI applications such as machine translation, speech recognition, information extraction, and other natural language processing tools. In this work, we facilitate standards-based sharing and management of terminology resources by providing an open terminology management solution - the EuroTermBank Toolkit. It allows organisations to manage and search their terms, create term collections, and share them within and outside the organisation by participating in the network of federated databases. The data curated in the federated databases are automatically shared with EuroTermBank, the largest multilingual terminology resource in Europe, allowing translators and language service providers as well as researchers and students to access terminology resources in their most current version.

eurotermbank, terminology, translation, (14 more...)

arXiv.org Artificial Intelligence

2207.06729

Country:

Europe > Latvia > Riga Municipality > Riga (0.05)
Europe > Estonia (0.05)
North America > Dominican Republic (0.04)
(8 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback