AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Approaching English-Polish Machine Translation Quality Assessment with Neural-based Methods

Nowakowski, Artur

arXiv.org Artificial IntelligenceSep-22-2022

This paper presents our contribution to the PolEval 2021 Task 2: Evaluation of translation quality assessment metrics. We describe experiments with pre-trained language models and state-of-the-art frameworks for translation quality assessment in both nonblind and blind versions of the task. Our solutions ranked second in the nonblind version and third in the blind version.

artificial intelligence, computational linguistic, natural language, (11 more...)

arXiv.org Artificial Intelligence

2209.11016

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.05)
Europe > Portugal > Lisbon > Lisbon (0.05)
(7 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Extending Word-Level Quality Estimation for Post-Editing Assistance

Wei, Yizhen, Utsuro, Takehito, Nagata, Masaaki

arXiv.org Artificial IntelligenceSep-22-2022

We define a novel concept called extended word alignment in order to improve post-editing assistance efficiency. Based on extended word alignment, we further propose a novel task called refined word-level QE that outputs refined tags and word-level correspondences. Compared to original word-level QE, the new task is able to directly point out editing operations, thus improves efficiency. To extract extended word alignment, we adopt a supervised method based on mBERT. To solve refined word-level QE, we firstly predict original QE tags by training a regression model for sequence tagging based on mBERT and XLM-R. Then, we refine original word tags with extended word alignment. In addition, we extract source-gap correspondences, meanwhile, obtaining gap tags. Experiments on two language pairs show the feasibility of our method and give us inspirations for further improvement.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.11378

Country:

North America > United States (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Sagare, Shivprasad, Abhishek, Tushar, Singh, Bhavyajeet, Sharma, Anubhav, Gupta, Manish, Varma, Vasudeva

arXiv.org Artificial IntelligenceSep-22-2022

Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. Hence, fact-to-text generation systems have been developed for various downstream tasks like generating soccer reports, weather and financial reports, medical reports, person biographies, etc. Unfortunately, previous work on fact-to-text (F2T) generation has focused primarily on English mainly due to the high availability of relevant datasets. Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed for generation across multiple languages alongwith a dataset, XALIGN for eight languages. However, there has been no rigorous work on the actual XF2T generation problem. We extend XALIGN dataset with annotated data for four more languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive study using popular Transformer-based text generation models on our extended multi-lingual dataset, which we call XALIGNV2. Further, we investigate the performance of different text generation strategies: multiple variations of pretraining, fact-aware embeddings and structure-aware input encoding. Our extensive experiments show that a multi-lingual mT5 model which uses fact-aware embeddings with structure-aware input encoding leads to best results on average across the twelve languages. We make our code, dataset and model publicly available, and hope that this will help advance further research in this critical area.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2209.11252

Country:

Asia > India > Uttar Pradesh (0.04)
Asia > India > Gujarat (0.04)
Africa > South Africa (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Too Much Trust in Machine Translation Could Have Deadly Consequences

SlateSep-21-2022, 09:50:00 GMT

Imagine you are in a foreign country where you don't speak the language and your small child unexpectedly starts to have a fever seizure. You take them to the hospital, and the doctors use an online translator to let you know that your kid is going to be OK. But "your child is having a seizure" accidentally comes up in your mother tongue is "your child is dead." This specific example is a very real possibility, according to a 2014 study published in the British Medical Journal about the limited usefulness of AI-powered machine translation in communications between patients and doctors. Sometimes we need American-British translation, too.)

google translate, machine translation, translation, (14 more...)

Slate

Country:

North America > United States > Kansas (0.05)
North America > United States > Arizona (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(2 more...)

Genre: Research Report > New Finding (0.49)

Industry:

Information Technology (0.96)
Government (0.96)
Law Enforcement & Public Safety (0.69)
Health & Medicine > Health Care Providers & Services (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Prose2Poem: The Blessing of Transformers in Translating Prose to Persian Poetry

Khanmohammadi, Reza, Mirshafiee, Mitra Sadat, Jouryabi, Yazdan Rezaee, Mirroshandel, Seyed Abolghasem

arXiv.org Artificial IntelligenceSep-21-2022

Persian Poetry has consistently expressed its philosophy, wisdom, speech, and rationale on the basis of its couplets, making it an enigmatic language on its own to both native and non-native speakers. Nevertheless, the notice able gap between Persian prose and poem has left the two pieces of literature medium-less. Having curated a parallel corpus of prose and their equivalent poems, we introduce a novel Neural Machine Translation (NMT) approach to translate prose to ancient Persian poetry using transformer-based Language Models in an extremely low-resource setting. More specifically, we trained a Transformer model from scratch to obtain initial translations and pretrained different variations of BERT to obtain final translations. To address the challenge of using masked language modelling under poeticness criteria, we heuristically joined the two models and generated valid poems in terms of automatic and human assessments. Final results demonstrate the eligibility and creativity of our novel heuristically aided approach among Literature professionals and non-professionals in generating novel Persian poems.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2109.14934

Country:

Asia > Middle East > Iran (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text Mutual Conversion

Xu, Yi, Fu, Luoyi, Lin, Zhouhan, Qi, Jiexing, Wang, Xinbing

arXiv.org Artificial IntelligenceSep-21-2022

Graph-to-text (G2T) generation and text-to-graph (T2G) triple extraction are two essential tasks for constructing and applying knowledge graphs. Existing unsupervised approaches turn out to be suitable candidates for jointly learning the two tasks due to their avoidance of using graph-text parallel data. However, they are composed of multiple modules and still require both entity information and relation type in the training process. To this end, we propose INFINITY, a simple yet effective unsupervised approach that does not require external annotation tools or additional parallel information. It achieves fully unsupervised graph-text mutual conversion for the first time. Specifically, INFINITY treats both G2T and T2G as a bidirectional sequence generation task by fine-tuning only one pretrained seq2seq model. A novel back-translation-based framework is then designed to automatically generate continuous synthetic parallel data. To obtain reasonable graph sequences with structural information from source texts, INFINITY employs reward-based training loss by leveraging the advantage of reward augmented maximum likelihood. As a fully unsupervised framework, INFINITY is empirically verified to outperform state-of-the-art baselines for G2T and T2G tasks.

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2209.10754

Country:

North America > United States > Texas > Tarrant County > Arlington (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding

Zan, Changtong, Ding, Liang, Shen, Li, Cao, Yu, Liu, Weifeng, Tao, Dacheng

arXiv.org Artificial IntelligenceSep-21-2022

For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e.g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e.g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages. To bridge the above cross-lingual domain and task gaps, we extend the vanilla pretrain-finetune pipeline with extra code-switching restore task. Specifically, the first stage employs the self-supervised code-switching restore task as a pretext task, allowing the multilingual Seq2Seq PLMs to acquire some in-domain alignment information. And for the second stage, we fine-tune the model on downstream data normally. Experiments on both NLG evaluation (12 bilingual translation tasks, 30 zero-shot translation tasks, and 2 cross-lingual summarization tasks) and NLU evaluation (7 cross-lingual natural language inference tasks) show our model outperforms the strong baseline mBART with standard finetuning strategy, consistently. Analyses indicate our approach could narrow the Euclidean distance of cross-lingual sentence representations, and improve the model generalization with trivial computational cost. We release the code at: https://github.com/zanchangtong/CSR4mBART.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2204.07834

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Shandong Province > Qingdao (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Relaxed Attention for Transformer Models

Lohrenz, Timo, Möller, Björn, Li, Zhengyang, Fingscheidt, Tim

arXiv.org Artificial IntelligenceSep-20-2022

The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models. In this paper, we explore relaxed attention, a simple and easy-to-implement smoothing of the attention weights, yielding a two-fold improvement to the general transformer architecture: First, relaxed attention provides regularization when applied to the self-attention layers in the encoder. Second, we show that it naturally supports the integration of an external language model as it suppresses the implicitly learned internal language model by relaxing the cross attention in the decoder. We demonstrate the benefit of relaxed attention across several tasks with clear improvement in combination with recent benchmark approaches. Specifically, we exceed the former state-of-the-art performance of 26.90% word error rate on the largest public lip-reading LRS3 benchmark with a word error rate of 26.31%, as well as we achieve a top-performing BLEU score of 37.67 on the IWSLT14 (DE$\rightarrow$EN) machine translation task without external language models and virtually no additional model parameters. Code and models will be made publicly available.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.09735

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(22 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)

Add feedback

LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging

Rosenbaum, Andy, Soltan, Saleh, Hamza, Wael, Versley, Yannick, Boese, Markus

arXiv.org Artificial IntelligenceSep-20-2022

We present LINGUIST, a method for generating annotated data for Intent Classification and Slot Tagging (IC+ST), via fine-tuning AlexaTM 5B, a 5-billion-parameter multilingual sequence-to-sequence (seq2seq) model, on a flexible instruction prompt. In a 10-shot novel intent setting for the SNIPS dataset, LINGUIST surpasses state-of-the-art approaches (Back-Translation and Example Extrapolation) by a wide margin, showing absolute improvement for the target intents of +1.9 points on IC Recall and +2.5 points on ST F1 Score. In the zero-shot cross-lingual setting of the mATIS++ dataset, LINGUIST out-performs a strong baseline of Machine Translation with Slot Alignment by +4.14 points absolute on ST F1 Score across 6 languages, while matching performance on IC. Finally, we verify our results on an internal large-scale multilingual dataset for conversational agent IC+ST and show significant improvements over a baseline which uses Back-Translation, Paraphrasing and Slot Catalog Resampling. To our knowledge, we are the first to demonstrate instruction fine-tuning of a large-scale seq2seq model to control the outputs of multilingual intent- and slot-labeled data generation.

information retrieval, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2209.099

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(15 more...)

Genre: Research Report > Promising Solution (0.47)

Industry:

Leisure & Entertainment > Sports > Hockey (1.00)
Leisure & Entertainment > Sports > Baseball (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
(2 more...)

Add feedback

The first neural machine translation system for the Erzya language

Dale, David

arXiv.org Artificial IntelligenceSep-19-2022

We present the first neural machine translation system for translation between the endangered Erzya language and Russian and the dataset collected by us to train and evaluate it. The BLEU scores are 17 and 19 for translation to Erzya and Russian respectively, and more than half of the translations are rated as acceptable by native speakers. We also adapt our model to translate between Erzya and 10 other languages, but without additional parallel data, the quality on these directions remains low. We release the translation models along with the collected text corpus, a new language identification model, and a multilingual sentence encoder adapted for the Erzya language. These resources will be available at https://github.com/slone-nlp/myv-nmt.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2209.09368

Country:

Asia > Russia (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Russia > Volga Federal District > Republic of Mordovia > Saransk (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback