AITopics

2306.01393

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Raunak, Vikas, Menezes, Arul, Post, Matt, Awadalla, Hany Hassan

Do GPTs Produce Less Literal Translations?

arXiv.org Artificial IntelligenceJun-5-2023

Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. Using literalness measures involving word alignment and monotonicity, we find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on MT quality metrics. We demonstrate that this finding is borne out in human evaluations as well. We then show that these differences are especially pronounced when translating sentences that contain idiomatic expressions.

large language model, machine learning, translation, (19 more...)

2305.16806

Country:

Asia > Middle East > Iran (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Bao, Guangsheng, Teng, Zhiyang, Zhang, Yue

Target-Side Augmentation for Document-Level Machine Translation

Document-level machine translation faces the challenge of data sparsity due to its long input length and a small amount of training data, increasing the risk of learning spurious patterns. To address this challenge, we propose a target-side augmentation method, introducing a data augmentation (DA) model to generate many potential translations for each source document. Learning on these wider range translations, an MT model can learn a smoothed distribution, thereby reducing the risk of data sparsity. We demonstrate that the DA model, which estimates the posterior distribution, largely improves the MT performance, outperforming the previous best system by 2.30 s-BLEU on News and achieving new state-of-the-art on News and Europarl benchmarks. Our code is available at https://github.com/baoguangsheng/target-side-augmentation.

machine learning, natural language, translation, (19 more...)

2305.04505

Country:

Asia > Russia (0.28)
Asia > China (0.04)
Europe > Spain (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zhou, Wangchunshu, Bras, Ronan Le, Choi, Yejin

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

Pre-trained Transformer models like T5 and BART have advanced the state of the art on a wide range of text generation tasks. Compressing these models into smaller ones has become critically important for practical use. Common neural network compression techniques such as knowledge distillation or quantization are limited to static compression where the compression ratio is fixed. In this paper, we introduce Modular Transformers, a modularized encoder-decoder framework for flexible sequence-to-sequence model compression. Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation. After training, the modularized layers can be flexibly assembled into sequence-to-sequence models that meet different performance-efficiency trade-offs. Experimental results show that after a single training phase, by simply varying the assembling strategy, Modular Transformers can achieve flexible compression ratios from 1.1x to 6x with little to moderate relative performance drop.

artificial intelligence, machine learning, natural language, (19 more...)

2306.02379

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
North America > Dominican Republic (0.04)
(16 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Liu, Yihong, Chronopoulou, Alexandra, Schütze, Hinrich, Fraser, Alexander

On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss

Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.

artificial intelligence, natural language, translation, (19 more...)

2305.17182

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(12 more...)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Han, Lifeng, Erofeev, Gleb, Sorokina, Irina, Gladkoff, Serge, Nenadic, Goran

Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks. This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning. We carry out an experimental investigation using Meta-AI's MMPLMs ``wmt21-dense-24-wide-en-X and X-en (WMT21fb)'' which were pre-trained on 7 language pairs and 14 translation directions including English to Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese, and the opposite direction. We fine-tune these MMPLMs towards English-\textit{Spanish} language pair which \textit{did not exist at all} in their original pre-trained corpora both implicitly and explicitly. We prepare carefully aligned \textit{clinical} domain data for this fine-tuning, which is different from their original mixed domain knowledge. Our experimental result shows that the fine-tuning is very successful using just 250k well-aligned in-domain EN-ES segments for three sub-task translation testings: clinical cases, clinical terms, and ontology concepts. It achieves very close evaluation scores to another MMPLM NLLB from Meta-AI, which included Spanish as a high-resource setting in the pre-training. To the best of our knowledge, this is the first work on using MMPLMs towards \textit{clinical domain transfer-learning NMT} successfully for totally unseen languages during pre-training.

machine learning, natural language, translation, (17 more...)

2210.06068

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(11 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-3-2023

Extract and Attend: Improving Entity Translation in Neural Machine Translation

Zeng, Zixin, Wang, Rui, Leng, Yichong, Guo, Junliang, Tan, Xu, Qin, Tao, Liu, Tie-yan

While Neural Machine Translation(NMT) has achieved great progress in recent years, it still suffers from inaccurate translation of entities (e.g., person/organization name, location), due to the lack of entity training instances. When we humans encounter an unknown entity during translation, we usually first look up in a dictionary and then organize the entity translation together with the translations of other parts to form a smooth target sentence. Inspired by this translation process, we propose an Extract-and-Attend approach to enhance entity translation in NMT, where the translation candidates of source entities are first extracted from a dictionary and then attended to by the NMT model to generate the target sentence. Specifically, the translation candidates are extracted by first detecting the entities in a source sentence and then translating the entities through looking up in a dictionary. Then, the extracted candidates are added as a prefix of the decoder input to be attended to by the decoder when generating the target sentence through self-attention. Experiments conducted on En-Zh and En-Ru demonstrate that the proposed method is effective on improving both the translation accuracy of entities and the overall translation quality, with up to 35% reduction on entity error rate and 0.85 gain on BLEU and 13.8 gain on COMET.

machine learning, natural language, translation, (18 more...)

2306.02242

Country:

Asia > Singapore (0.05)
Asia > China (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Gérardin, Christel, Xiong, Yuhan, Wajsbürt, Perceval, Carrat, Fabrice, Tannier, Xavier

Impact of translation on biomedical information extraction from real-life clinical notes

arXiv.org Artificial IntelligenceJun-3-2023

The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes. We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the first translation step, we compare a two-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms. Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested. In conclusion, despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.

artificial intelligence, natural language, translation, (17 more...)

2306.02042

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)

Saxon, Michael, Wang, William Yang

Multilingual Conceptual Coverage in Text-to-Image Models

arXiv.org Artificial IntelligenceJun-2-2023

We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language. This technique allows us to estimate how well-suited a model is to a target language as well as identify model-specific weaknesses, spurious correlations, and biases without a-priori assumptions. We demonstrate how it can be used to benchmark T2I models in terms of multilinguality, and how despite its simplicity it is a good proxy for impressive generalization.

large language model, machine learning, natural language, (16 more...)

2306.01735

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.33)

arXiv.org Artificial IntelligenceJun-2-2023

Evaluating Machine Translation Quality with Conformal Predictive Distributions

Giovannotti, Patrizio

This paper presents a new approach for assessing uncertainty in machine translation by simultaneously evaluating translation quality and providing a reliable confidence score. Our approach utilizes conformal predictive distributions to produce prediction intervals with guaranteed coverage, meaning that for any given significance level $\epsilon$, we can expect the true quality score of a translation to fall out of the interval at a rate of $1-\epsilon$. In this paper, we demonstrate how our method outperforms a simple, but effective baseline on six different language pairs in terms of coverage and sharpness. Furthermore, we validate that our approach requires the data exchangeability assumption to hold for optimal performance.

artificial intelligence, machine learning, natural language, (16 more...)

2306.01549

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Dominican Republic (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)