AITopics

2505.21937

Country:

Europe (1.00)
North America > United States (0.67)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Neural Information Processing SystemsMay-28-2025, 23:28:41 GMT

Appendix for "R-Drop: Regularized Dropout for Neural Networks "

We provide more detailed settings for the experiments of each task in this part. A.1 Neural Machine Translation For all the NMT tasks, we use the public datasets from IWSLT competitions After tokenization, the resulted vocabularies for IWSLT datasets are near 10k, while for WMT datasets, the vocabulary size is about 32k. To train the Transformer based NMT models, we use transformer_iwslt_de_en configuration for IWSLT translations, which has 6 layers in both encoder and decoder, embedding size 512, feed-forward size 1, 024, attention heads 4, dropout value 0.3, weight decay 0.0001. Label smoothing [12] is adopted with value 0.1. To evaluate the performance, we use multi-bleu.perl

r-drop, transformer, translation, (16 more...)

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

The Multilingual Divide and Its Impact on Global AI Safety

Peppin, Aidan, Kreutzer, Julia, Sebag, Alice Schoenauer, Marchisio, Kelly, Ermis, Beyza, Dang, John, Cahyawijaya, Samuel, Singh, Shivalika, Goldfarb-Tarrant, Seraphina, Aryabumi, Viraat, Aakanksha, null, Ko, Wei-Yin, Üstün, Ahmet, Gallé, Matthias, Fadaee, Marzieh, Hooker, Sara

Despite advances in large language model capabilities in recent years, a large gap remains in their capabilities and safety performance for many languages beyond a relatively small handful of globally dominant languages. This paper provides researchers, policymakers and governance experts with an overview of key challenges to bridging the "language gap" in AI and minimizing safety risks across languages. We provide an analysis of why the language gap in AI exists and grows, and how it creates disparities in global AI safety. We identify barriers to address these challenges, and recommend how those working in policy and governance can help address safety concerns associated with the language gap by supporting multilingual dataset creation, transparency, and research.

computational linguistic, large language model, machine learning, (20 more...)

2505.21344

Country:

Europe (1.00)
Africa (1.00)
Asia > Middle East (0.93)
North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Government > Regional Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Issam, Abderrahmane, Semerci, Yusuf Can, Scholtes, Jan, Spanakis, Gerasimos

A Representation Level Analysis of NMT Model Robustness to Grammatical Errors

Understanding robustness is essential for building reliable NLP systems. Unfortunately, in the context of machine translation, previous work mainly focused on documenting robustness failures or improving robustness. In contrast, we study robustness from a model representation perspective by looking at internal model representations of ungrammatical inputs and how they evolve through model layers. For this purpose, we perform Grammatical Error Detection (GED) probing and representational similarity analysis. Our findings indicate that the encoder first detects the grammatical error, then corrects it by moving its representation toward the correct form. To understand what contributes to this process, we turn to the attention mechanism where we identify what we term Robustness Heads. We find that Robustness Heads attend to interpretable linguistic units when responding to grammatical errors, and that when we fine-tune models for robustness, they tend to rely more on Robustness Heads for updating the ungrammatical word representation.

computational linguistic, machine learning, natural language, (18 more...)

2505.21224

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment

Li, Zheng, Zheng, Mao, Song, Mingyang, Yang, Wenjie

Recently, deep reasoning large language models(LLMs) like DeepSeek-R1 have made significant progress in tasks such as mathematics and coding. Inspired by this, several studies have employed reinforcement learning(RL) to enhance models' deep reasoning capabilities and improve machine translation(MT) quality. However, the terminology translation, an essential task in MT, remains unexplored in deep reasoning LLMs. In this paper, we propose \textbf{TAT-R1}, a terminology-aware translation model trained with reinforcement learning and word alignment. Specifically, we first extract the keyword translation pairs using a word alignment model. Then we carefully design three types of rule-based alignment rewards with the extracted alignment relationships. With those alignment rewards, the RL-trained translation model can learn to focus on the accurate translation of key information, including terminology in the source text. Experimental results show the effectiveness of TAT-R1. Our model significantly improves terminology translation accuracy compared to the baseline models while maintaining comparable performance on general translation tasks. In addition, we conduct detailed ablation studies of the DeepSeek-R1-like training paradigm for machine translation and reveal several key findings.

large language model, machine learning, translation, (18 more...)

2505.21172

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

Xu, Ting, Huang, Zhichao, Sun, Jiankai, Cheng, Shanbo, Lam, Wai

We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the supervised fine-tuning (SFT) model by 1.13 points in COMET, while reducing the Average Lagging by 6.17 in the NEWSTEST2021 En to Zh dataset. While SiMT operates with far less context than offline translation, the SiMT results of SeqPO-SiMT on 7B LLM surprisingly rival the offline translation of high-performing LLMs, including Qwen-2.5-7B-Instruct and LLaMA-3-8B-Instruct.

large language model, machine learning, translation, (17 more...)

2505.20622

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 17:11:51 GMT

Scaling Sign Language Translation

Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. We perform large-scale SLT pretraining on different data including 1) noisy multilingual Youtube SLT data,2) parallel text corpora, and 3) SLT data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SLT model with pretrained (m/By)T5 models across model sizes.

model size, scaling sign language translation, slt, (2 more...)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.98)

Neural Information Processing SystemsMay-27-2025, 15:26:58 GMT

Improving Gloss-free Sign Language Translation by Reducing Representation Density

Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCLachieves better performance with only 35\% of its parameters.

gloss-free sign language translation, representation density, representation density problem, (2 more...)

Industry: Education > Curriculum > Subject-Specific Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.64)

Neural Information Processing SystemsMay-27-2025, 14:09:02 GMT

Revisiting Neural Scaling Laws in Language and Vision

The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting (interpolating) parameters. We then present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains, including image classification, neural machine translation (NMT) and language modeling, in addition to tasks from the BIG-Bench evaluation benchmark. Finally, we release a benchmark dataset comprising of 90 evaluation tasks to facilitate research in this domain.

dataset, language and vision, revisiting neural scaling law

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Neural Information Processing SystemsMay-27-2025, 11:48:29 GMT

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.

artificial intelligence, machine learning, natural language, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)