AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

Wu, Minghao, Wang, Yufei, Foster, George, Qu, Lizhen, Haffari, Gholamreza

arXiv.org Artificial IntelligenceJan-27-2024

Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive, in contrast to its sentence-level counterpart. However, due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity. To overcome this issue, we propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients. We conduct comprehensive experiments on three widely-used DocNMT benchmarks. Our empirical results show that our proposed IADA outperforms strong DocNMT baselines as well as several data augmentation approaches, with statistical significance on both sentence-level and document-level BLEU.

computational linguistic, linguistic, translation, (14 more...)

arXiv.org Artificial Intelligence

2401.1536

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Germany > Berlin (0.04)
(20 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages

Ghosh, Sourojit, Chatterjee, Srishti

arXiv.org Artificial IntelligenceJan-27-2024

This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.

bengali, machine translation, translation, (11 more...)

arXiv.org Artificial Intelligence

2401.13165

Country:

Asia > China (0.15)
North America > United States > New York > New York County > New York City (0.05)
Asia > Pakistan (0.05)
(23 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government (0.68)
Law (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Language Modelling Approaches to Adaptive Machine Translation

Moslem, Yasmin

arXiv.org Artificial IntelligenceJan-25-2024

Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, in-domain data scarcity is common in translation settings, due to the lack of specialised datasets and terminology, or inconsistency and inaccuracy of available in-domain translations. In such scenarios where there is insufficient in-domain data to fine-tune MT models, producing translations that are consistent with the relevant context is challenging. While real-time adaptation can make use of smaller amounts of in-domain data to improve the translation on the fly, it remains challenging due to supported context limitations and efficiency constraints. Large language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. Such capabilities have opened new horizons for domain-specific data augmentation and real-time adaptive MT. This work attempts to address two main relevant questions: 1) in scenarios involving human interaction and continuous feedback, can we employ language models to improve the quality of adaptive MT at inference time? and 2) in the absence of sufficient in-domain data, can we use pre-trained large-scale language models to improve the process of MT domain adaptation?

bilingual synthetic data, domain terminology integration, generate diverse alternative, (13 more...)

arXiv.org Artificial Intelligence

2401.14559

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.13)
(43 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Law (0.92)
Information Technology (0.92)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation

Uludoğan, Gökçe, Balal, Zeynep Yirmibeşoğlu, Akkurt, Furkan, Türker, Melikşah, Güngör, Onur, Üsküdarlı, Susan

arXiv.org Artificial IntelligenceJan-25-2024

The recent advances in natural language processing have predominantly favored well-resourced English-centric models, resulting in a significant gap with low-resource languages. In this work, we introduce the language model TURNA, which is developed for the low-resource language Turkish and is capable of both natural language understanding and generation tasks. TURNA is pretrained with an encoder-decoder architecture based on the unified framework UL2 with a diverse corpus that we specifically curated for this purpose. We evaluated TURNA with three generation tasks and five understanding tasks for Turkish. The results show that TURNA outperforms several multilingual models in both understanding and generation tasks, and competes with monolingual Turkish models in understanding tasks. TURNA is made available at https://huggingface.co/boun-tabi-LMG/TURNA .

computational linguistic, dataset, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2401.14373

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(14 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

Khurana, Sameer, Dawalatabad, Nauman, Laurent, Antoine, Vicente, Luis, Gimeno, Pablo, Mingote, Victoria, Glass, James

arXiv.org Artificial IntelligenceJan-25-2024

Research in multilingual speech-to-text translation is topical. Having a single model that supports multiple translation tasks is desirable. The goal of this work it to improve cross-lingual transfer learning in multilingual speech-to-text translation via semantic knowledge distillation. We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, a multilingual speech transformer encoder trained using multi-modal (speech-text) semantic knowledge distillation, we achieve significantly better cross-lingual task knowledge transfer than the baseline XLS-R, a multilingual speech transformer encoder trained via self-supervised learning. We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl. On the 21 translation tasks of the CoVoST-2 benchmark, we achieve an average improvement of 12.8 BLEU points over the baselines. In the zero-shot translation scenario, we achieve an average gain of 18.8 and 11.9 average BLEU points on unseen medium and low-resource languages. We make similar observations on Europarl speech translation benchmark.

encoder, translation model, translation task, (13 more...)

arXiv.org Artificial Intelligence

2306.00789

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MambaByte: Token-free Selective State Space Model

Wang, Junxiong, Gangavarapu, Tushaar, Yan, Jing Nathan, Rush, Alexander M

arXiv.org Artificial IntelligenceJan-24-2024

Token-free language models learn directly from raw bytes and remove the bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences, and standard autoregressive Transformers scale poorly in such settings. We experiment with MambaByte, a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. Our experiments indicate the computational efficiency of MambaByte compared to other byte-level models. We also find MambaByte to be competitive with and even outperform state-of-the-art subword Transformers. Furthermore, owing to linear scaling in length, MambaByte benefits from fast inference compared to Transformers. Our findings establish the viability of MambaByte in enabling token-free language modeling.

byte, mambabyte, transformer, (14 more...)

arXiv.org Artificial Intelligence

2401.1366

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Rail (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

DuSell, Brian, Chiang, David

arXiv.org Artificial IntelligenceJan-24-2024

Attention, specifically scaled dot-product attention, has proven effective for natural language, but it does not have a mechanism for handling hierarchical patterns of arbitrary nesting depth, which limits its ability to recognize certain syntactic structures. To address this shortcoming, we propose stack attention: an attention operator that incorporates stacks, inspired by their theoretical connections to context-free languages (CFLs). We show that stack attention is analogous to standard attention, but with a latent model of syntax that requires no syntactic supervision. We propose two variants: one related to deterministic pushdown automata (PDAs) and one based on nondeterministic PDAs, which allows transformers to recognize arbitrary CFLs. We show that transformers with stack attention are very effective at learning CFLs that standard transformers struggle on, achieving strong results on a CFL with theoretically maximal parsing difficulty. We also show that stack attention is more effective at natural language modeling under a constrained parameter budget, and we include results on machine translation.

differentiable stack, stack attention, transformer, (15 more...)

arXiv.org Artificial Intelligence

2310.01749

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York > New York County > New York City (0.04)
(18 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

Liu, Danni, Niehues, Jan

arXiv.org Artificial IntelligenceJan-24-2024

Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync with recent progress in pretrained massively multilingual translation models. In response, we transfer the attribute controlling capabilities to languages without attribute-annotated data with an NLLB-200 model as a foundation. Inspired by techniques from controllable generation, we employ a gradient-based inference-time controller to steer the pretrained model. The controller transfers well to zero-shot conditions, as it operates on pretrained multilingual representations and is attribute -- rather than language-specific. With a comprehensive comparison to finetuning-based control, we demonstrate that, despite finetuning's clear dominance in supervised settings, the gap to inference-time control closes when moving to zero-shot conditions, especially with new and distant target languages. The latter also shows stronger domain robustness. We further show that our inference-time control complements finetuning. A human evaluation on a real low-resource language, Bengali, confirms our findings. Our code is https://github.com/dannigt/attribute-controller-transfer

computational linguistic, proceedings, translation, (15 more...)

arXiv.org Artificial Intelligence

2309.08565

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(18 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Locality enhanced dynamic biasing and sampling strategies for contextual ASR

Jalal, Md Asif, Parada, Pablo Peso, Pavlidis, George, Moschopoulos, Vasileios, Saravanan, Karthikeyan, Kontoulis, Chrysovalantis-Giorgos, Zhang, Jisi, Drosou, Anastasios, Lee, Gil Ho, Lee, Jungin, Jung, Seokyeong

arXiv.org Artificial IntelligenceJan-23-2024

Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline.

contextual, representation, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2401.13146

Country:

Europe > United Kingdom (0.04)
Europe > Greece (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

Wu, Zhengxuan, Tamkin, Alex, Papadimitriou, Isabel

arXiv.org Artificial IntelligenceJan-23-2024

When we transfer a pretrained language model to a new language, there are many axes of variation that change at once. To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of controlled transfer studies: we systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time, and then measure the resulting drops in a pretrained model's downstream performance. We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment and embedding matrix re-initialization, even with continued pretraining on 15 million tokens. %On the other hand, transferring to a dataset with an unaligned vocabulary is extremely hard to recover from in the low-data regime. Moreover, good-quality tokenizers in the transfer language do not make vocabulary alignment easier. Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.

arxiv preprint arxiv, computational linguistic, tokenizer, (12 more...)

arXiv.org Artificial Intelligence

2202.12312

Country:

North America > Dominican Republic (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)

Add feedback