AITopics

Neural Information Processing SystemsJan-25-2025, 17:00:14 GMT

Review for NeurIPS paper: Data Diversification: A Simple Strategy For Neural Machine Translation

This work describes a simple approach to synthetically augment the training dataset for neural machine translation. The proposed approach involves training multiple forward and backward MT models and appending their outputs on the original training dataset to the training data. This augmented (or diversified) training dataset can then be used to train the next generation of models. The proposed approach is simple, achieves good results, and the authors do a good job presenting the idea. The paper is quite empirical and the technique fairly specific to NMT, but it is still interesting to see that sometimes simple ideas work well and are thus important / deserve careful consideration.

data diversification, neural machine translation, training dataset, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceJan-25-2025

Speech Translation Refinement using Large Language Models

Dou, Huaixia, Tian, Xinyu, Lyu, Xinglin, Zhu, Jie, Li, Junhui, Guo, Lifan

Recent advancements in large language models (LLMs) have demonstrated their remarkable capabilities across various language tasks. Inspired by the success of text-to-text translation refinement, this paper investigates how LLMs can improve the performance of speech translation by introducing a joint refinement process. Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved in both training-free in-context learning and parameter-efficient fine-tuning scenarios. Additionally, we explore the effect of document-level context on refinement under the context-aware fine-tuning scenario. Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach using several popular LLMs including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B. Further analysis further suggests that jointly refining both transcription and translation yields better performance compared to refining translation alone. Meanwhile, incorporating document-level context significantly enhances refinement performance. We release our code and datasets on GitHub.

large language model, machine learning, translation, (19 more...)

2501.1509

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Prasad, Kritarth, Zaki, Mohammadi, Singh, Pratik, Wasnik, Pankaj

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

arXiv.org Artificial IntelligenceJan-25-2025

Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring inference across \textit{all} candidate models, leading to significant computational overhead, generally $\Omega(L)$. This paper introduces \textbf{SmartGen}, a reinforcement learning (RL)-based strategy that improves the CSB by selecting a small, fixed number of candidates and identifying optimal groups to pass to the fusion block for each input sentence. Furthermore, previously, the CSB and FB were trained independently, leading to suboptimal NMT performance. Our DQN-based \textbf{SmartGen} addresses this by using feedback from the FB block as a reward during training. We also resolve a key issue in earlier methods, where candidates were passed to the FB without modification, by introducing a Competitive Correction Block (CCB). Finally, we validate our approach with extensive experiments on English-Hindi translation tasks in both directions.

machine learning, natural language, translation, (17 more...)

2501.15219

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsJan-24-2025, 19:24:40 GMT

Reviews: Fast Structured Decoding for Sequence Models

The paper proposes to boost translation quality of a non-autoregressive (NART) neural machine translation system through a conditional random field (CRF) that is attached to the decoder. The CRF reduces the translation quality drop compared to autoregressive neural translation systems by imposing a bigram-language model like structure onto the decoder that helps to alleviate the strong independence assumption that NART architectures entail. The CRF is jointly trained with all other parameters of the neural network. Experiments conducted on WMT14 and IWSLT14 En-De and De-En tasks are reported to yield improvements of more than 6 BLEU points over their corresponding baselines. By augmenting the decoder with a Markov-order 1 CRF, the resulting network is strictly speaking no longer a non-autoregressive system.

bleu score, fast structured decoding, sequence model, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceJan-24-2025

Comparable Corpora: Opportunities for New Research Directions

Church, Kenneth

Most conference papers present new results, but this paper will focus more on opportunities for the audience to make their own contributions. This paper is intended to challenge the community to think more broadly about what we can do with comparable corpora. We will start with a review of the history, and then suggest new directions for future research. This was a keynote at BUCC-2025, a workshop associated with Coling-2025.

artificial intelligence, machine learning, natural language, (16 more...)

2501.14721

Country:

Asia > China > Hong Kong (0.05)
North America > United States > New York (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(12 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Shamal, Mariam, Hassani, Hossein

Domain-Specific Machine Translation to Translate Medicine Brochures in English to Sorani Kurdish

Access to Kurdish medicine brochures is limited, depriving Kurdish-speaking communities of critical health information. To address this problem, we developed a specialized Machine Translation (MT) model to translate English medicine brochures into Sorani Kurdish using a parallel corpus of 22,940 aligned sentence pairs from 319 brochures, sourced from two pharmaceutical companies in the Kurdistan Region of Iraq (KRI). We trained a Statistical Machine Translation (SMT) model using the Moses toolkit, conducting seven experiments that resulted in BLEU scores ranging from 22.65 to 48.93. We translated three new brochures to improve the evaluation process and encountered unknown words. We addressed unknown words through post-processing with a medical dictionary, resulting in BLEU scores of 56.87, 31.05, and 40.01. Human evaluation by native Kurdish-speaking pharmacists, physicians, and medicine users showed that 50% of professionals found the translations consistent, while 83.3% rated them accurate. Among users, 66.7% considered the translations clear and felt confident using the medications.

artificial intelligence, brochure, natural language, (14 more...)

2501.13609

Country:

Europe > Middle East (0.04)
Europe > Greece (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.78)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

Cui, Guofeng, Wang, Pichao, Liu, Yang, Ke, Zemian, Liu, Zhu, Bhat, Vimal

Large language models (LLMs) have shown great potential in natural language processing tasks, but their application to machine translation (MT) remains challenging due to pretraining on English-centric data and the complexity of reinforcement learning from human feedback (RLHF). Direct Preference Optimization (DPO) has emerged as a simpler and more efficient alternative, but its performance depends heavily on the quality of preference data. To address this, we propose Confidence-Reward driven Preference Optimization (CRPO), a novel method that combines reward scores with model confidence to improve data selection for fine-tuning. CRPO selects challenging sentence pairs where the model is uncertain or underperforms, leading to more effective learning. While primarily designed for LLMs, CRPO also generalizes to encoder-decoder models like NLLB, demonstrating its versatility. Empirical results show that CRPO outperforms existing methods such as RS-DPO, RSO and MBR score in both translation accuracy and data efficiency.

artificial intelligence, machine learning, natural language, (18 more...)

2501.13927

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Maddireddy, Kritin, Methukula, Santhosh Kotekal, Sridhar, Chandrasekar, Vaidhyanathan, Karthik

LoCoML: A Framework for Real-World ML Inference Pipelines

The widespread adoption of machine learning (ML) has brought forth diverse models with varying architectures, and data requirements, introducing new challenges in integrating these systems into real-world applications. Traditional solutions often struggle to manage the complexities of connecting heterogeneous models, especially when dealing with varied technical specifications. These limitations are amplified in large-scale, collaborative projects where stakeholders contribute models with different technical specifications. To address these challenges, we developed LoCoML, a low-code framework designed to simplify the integration of diverse ML models within the context of the \textit{Bhashini Project} - a large-scale initiative aimed at integrating AI-driven language technologies such as automatic speech recognition, machine translation, text-to-speech, and optical character recognition to support seamless communication across more than 20 languages. Initial evaluations show that LoCoML adds only a small amount of computational load, making it efficient and effective for large-scale ML integration. Our practical insights show that a low-code approach can be a practical solution for connecting multiple ML models in a collaborative environment.

artificial intelligence, machine learning, natural language, (21 more...)

2501.14165

Country: Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.89)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions

Hamed, Injy, Sabty, Caroline, Abdennadher, Slim, Vu, Ngoc Thang, Solorio, Thamar, Habash, Nizar

Language in the Arab world presents a complex diglossic and multilingual setting, involving the use of Modern Standard Arabic, various dialects and sub-dialects, as well as multiple European languages. This diverse linguistic landscape has given rise to code-switching, both within Arabic varieties and between Arabic and foreign languages. The widespread occurrence of code-switching across the region makes it vital to address these linguistic needs when developing language technologies. In this paper, we provide a review of the current literature in the field of code-switched Arabic NLP, offering a broad perspective on ongoing efforts, challenges, research gaps, and recommendations for future research directions.

large language model, machine learning, natural language, (24 more...)

2501.13419

Country:

Africa > Sudan (0.14)
Asia > Middle East > Saudi Arabia (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
(17 more...)

Genre: Overview (1.00)

Industry:

Information Technology (0.93)
Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)
(5 more...)