AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Learning to Scaffold: Optimizing Model Explanations for Teaching

Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig

Neural Information Processing SystemsAug-19-2025, 16:17:17 GMT

While deep learning's performance has led it to become the dominant paradigm in machine learning,

explanation, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(11 more...)

Genre: Research Report (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

Neural Information Processing SystemsAug-19-2025, 05:39:03 GMT

Specifically, we introduce Multiway Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer.

image-text pair, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

The BigScience ROOTS Corpus: A1.6TB Composite Multilingual Dataset

Neural Information Processing SystemsAug-19-2025, 00:51:25 GMT

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Slovenia (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(29 more...)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

When Alignment Hurts: Decoupling Representational Spaces in Multilingual Models

Elshabrawy, Ahmed, Kaing, Hour, Song, Haiyue, Aji, Alham Fikri, Tanaka, Hideki, Utiyama, Masao, Dabre, Raj

arXiv.org Artificial IntelligenceAug-19-2025

Alignment with high-resource standard languages is often assumed to aid the modeling of related low-resource varieties. We challenge this assumption by demonstrating that excessive representational entanglement with a dominant variety, such as Modern Standard Arabic (MSA) in relation to Arabic dialects, can actively hinder generative modeling. We present the first comprehensive causal study of this phenomenon by analyzing and directly intervening in the internal representation geometry of large language models (LLMs). Our key contribution is an online variational probing framework that continuously estimates the subspace of the standard variety during fine-tuning, enabling projection-based decoupling from this space. While our study uses Arabic as a case due to its unusually rich parallel resources across 25 dialects, the broader motivation is methodological: dialectal MT serves as a controlled proxy for generative tasks where comparable multi-variety corpora are unavailable. Across 25 dialects, our intervention improves generation quality by up to +4.9 chrF++ and +2.0 on average compared to standard fine-tuning, despite a measured tradeoff in standard-language performance. These results provide causal evidence that subspace dominance by high-resource varieties can restrict generative capacity for related varieties. More generally, we unify geometric and information-theoretic probing with subspace-level causal interventions, offering practical tools for improving generative modeling in closely related language families and, more broadly, for controlling representational allocation in multilingual and multi-domain LLMs.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.12803

Country:

Africa (0.68)
North America (0.68)
Asia > Middle East > Iraq (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

From SALAMANDRA to SALAMANDRATA: BSC Submission for WMT25 General Machine Translation Shared Task

Gilabert, Javier Garcia, Liao, Xixian, Da Dalt, Severino, Bohman, Ella, Mash, Audrey, Fornaciari, Francesca De Luca, Baucells, Irene, Llop, Joan, Argote, Miguel Claramunt, Escolano, Carlos, Melero, Maite

arXiv.org Artificial IntelligenceAug-19-2025

In this paper, we present the SALAMANDRATA family of models, an improved iteration of SALAMANDRA LLMs (Gonzalez-Agirre et al., 2025) specifically trained to achieve strong performance in translation-related tasks for 38 European languages. SALAMANDRATA comes in two scales: 2B and 7B parameters. For both versions, we applied the same training recipe with a first step of continual pre-training on parallel data, and a second step of supervised fine-tuning on high-quality instructions. The BSC submission to the WMT25 General Machine Translation shared task is based on the 7B variant of SALAMANDRATA. We first adapted the model vocabulary to support the additional non-European languages included in the task. This was followed by a second phase of continual pre-training and supervised fine-tuning, carefully designed to optimize performance across all translation directions for this year's shared task. For decoding, we employed two quality-aware strategies: Minimum Bayes Risk Decoding and Tuned Re-ranking using COMET and COMET-KIWI respectively. We publicly release both the 2B and 7B versions of SALAMANDRATA, along with the newer SALAMANDRATA-V2 model, on Hugging Face1

computational linguistic, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2508.12774

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f18a6d1cde4b205199de8729a6637b42-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 19:22:44 GMT

machine learning, natural language, node, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

A Appendix

Neural Information Processing SystemsAug-18-2025, 06:28:33 GMT

A.1 Summary of Commonly Used Metrics for T ext Generation Table 1: Summary of commonly used metrics for text generation. For settings and tasks, we only list the ones justified by the original paper for each metric. We conduct experiments on WMT19, and the results are shown in Tab. 2. We don't observe A.3 Prompt Set In Tab. 3, we list the full prompt set for both s h direction and h r direction. Prompt Set s h Last Tersely Succinctly In summation To put it succinctly After In brief All in all To summarize Bringing up the rear Behind In short In outline In a nutshell To come to the point Lastly Concisely In closing In conclusion In the final analysis In sum In precis In passing In winding up Without wasting words To end In a word To conclude Last in order At the end of the day Curtly Compactly Summarising In a few words Without waste of words Crisply Summarily In the rear As a final point Finally yet importantly At last To sum up Summarizing Not least of all To put it in a nutshell Pithily Basically Laconically To put it briefly When all is said and done Shortly In the end At the rear Not to mince words To cut a long story short In fine At the end To be brief Last but not least Not to beat about the bush Finally In essence Last of all Just as importantly In drawing things to a close Briefly Ultimately Elliptically To put it concisely Not to put too fine a point on ith r As To wit As it were Case in point As an illustration sc. That is Especially That is to say To give an example i.e.

artificial intelligence, metric, natural language, (9 more...)

Neural Information Processing Systems

Country: Europe > Denmark > Capital Region > Copenhagen (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.30)

Add feedback

df9028fcb6b065e000ffe8a4f03eeb38-Supplemental.pdf

Neural Information Processing SystemsAug-18-2025, 01:27:52 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Colorado (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

df9028fcb6b065e000ffe8a4f03eeb38-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 01:27:49 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Colorado (0.04)
(3 more...)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Controllable Text Generation with Neurally-Decomposed Oracle

Neural Information Processing SystemsAug-18-2025, 00:14:45 GMT

Auto-regressive language models have been widely used for text generation. With the recent development of large-scale pre-trained language models (Radford et al., 2019; Brown et al., 2020; Raffel et al., 2020; Lewis et al., 2020), they have achieved state-of-the-art performances in applications

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback