Machine Translation
FlowletFormer: Network Behavioral Semantic Aware Pre-training Model for Traffic Classification
Liu, Liming, Li, Ruoyu, Li, Qing, Hou, Meijia, Jiang, Yong, Xu, Mingwei
Network traffic classification using pre-training models has shown promising results, but existing methods struggle to capture packet structural characteristics, flow-level behaviors, hierarchical protocol semantics, and inter-packet contextual relationships. To address these challenges, we propose FlowletFormer, a BERT -based pre-training model specifically designed for network traffic analysis. FlowletFormer introduces a Coherent Behavior-A ware Traffic Representation Model for segmenting traffic into semantically meaningful units, a Protocol Stack Alignment-Based Embedding Layer to capture multilayer protocol semantics, and Field-Specific and Context-A ware Pretraining Tasks to enhance both inter-packet and inter-flow learning. Experimental results demonstrate that FlowletFormer significantly outperforms existing methods in the effectiveness of traffic representation, classification accuracy, and few-shot learning capability. Moreover, by effectively integrating domain-specific network knowledge, FlowletFormer shows better comprehension of the principles of network transmission (e.g., stateful connections of TCP), providing a more robust and trustworthy framework for traffic analysis.
Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Hasan, Mohammed Rakibul, Majid, Rafi, Tahmid, Ahanaf
In this paper, we introduce Bangla-Bayanno, an open-ended Visual Question Answering (VQA) Dataset in Bangla, a widely used, low-resource language in multimodal AI research. The majority of existing datasets are either manually annotated with an emphasis on a specific domain, query type, or answer type or are constrained by niche answer formats. In order to mitigate human-induced errors and guarantee lucidity, we implemented a multilingual LLM-assisted translation refinement pipeline. This dataset overcomes the issues of low-quality translations from multilingual sources. The dataset comprises 52,650 question-answer pairs across 4750+ images. Questions are classified into three distinct answer types: nominal (short descriptive), quantitative (numeric), and polar (yes/no). Bangla-Bayanno provides the most comprehensive open-source, high-quality VQA benchmark in Bangla, aiming to advance research in low-resource multimodal learning and facilitate the development of more inclusive AI systems.
Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study
Mosquera, Manuel, Robles, Melissa, Rodriguez, Johan, Manrique, Ruben
Low-resource machine translation remains a significant challenge for large language models (LLMs), which often lack exposure to these languages during pretraining and have limited parallel data for fine-tuning. We propose a novel approach that enhances translation for low-resource languages by integrating an external dictionary tool and training models end-to-end using reinforcement learning, in addition to supervised fine-tuning. Focusing on the Spanish-Wayuunaiki language pair, we frame translation as a tool-augmented decision-making problem in which the model can selectively consult a bilingual dictionary during generation. Our method combines supervised instruction tuning with Guided Reward Policy Optimization (GRPO), enabling the model to learn both when and how to use the tool effectively. BLEU similarity scores are used as rewards to guide this learning process. Preliminary results show that our tool-augmented models achieve up to +3.37 BLEU improvement over previous work, and a 18% relative gain compared to a supervised baseline without dictionary access, on the Spanish-Wayuunaiki test set from the AmericasNLP 2025 Shared Task. We also conduct ablation studies to assess the effects of model architecture and training strategy, comparing Qwen2.5-0.5B-Instruct with other models such as LLaMA and a prior NLLB-based system.
Bridging Language Gaps: Enhancing Few-Shot Language Adaptation
Borchert, Philipp, De Weerdt, Jochen, Moens, Marie-Francine
The disparity in language resources poses a challenge in multilingual NLP, with high-resource languages benefiting from extensive data, while low-resource languages lack sufficient data for effective training. Our Contrastive Language Alignment with Prompting (CoLAP) method addresses this gap by integrating contrastive learning with cross-lingual representations, facilitating task-specific knowledge transfer from high-resource to lower-resource languages. The primary advantage of our approach is its data efficiency, enabling rapid adaptation to new languages and reducing the need for large labeled datasets. We conduct experiments with multilingual encoder-only and decoder-only language models on natural language understanding tasks, including natural language inference and relation extraction, evaluating performance across both high- and low-resource languages. Our results demonstrate that CoLAP outperforms few-shot cross-lingual transfer baselines and in-context learning, even with limited available data. This effectively narrows the cross-lingual performance gap, contributing to the development of more efficient multilingual NLP techniques.
A New NMT Model for Translating Clinical Texts from English to Spanish
Li, Rumeng, Wang, Xun, Yu, Hong
Translating electronic health record (EHR) narratives from English to Spanish is a clinically important yet challenging task due to the lack of a parallel-aligned corpus and the abundant unknown words contained. To address such challenges, we propose \textbf{NOOV} (for No OOV), a new neural machine translation (NMT) system that requires little in-domain parallel-aligned corpus for training. NOOV integrates a bilingual lexicon automatically learned from parallel-aligned corpora and a phrase look-up table extracted from a large biomedical knowledge resource, to alleviate both the unknown word problem and the word-repeat challenge in NMT, enhancing better phrase generation of NMT systems. Evaluation shows that NOOV is able to generate better translation of EHR with improvement in both accuracy and fluency.
COMET-poly: Machine Translation Metric Grounded in Other Candidates
Züfle, Maike, Zouhar, Vilém, Dinh, Tu Anh, Polo, Felipe Maia, Niehues, Jan, Sachan, Mrinmaya
Automated metrics for machine translation attempt to replicate human judgment. Unlike humans, who often assess a translation in the context of multiple alternatives, these metrics typically consider only the source sentence and a single translation. This discrepancy in the evaluation setup may negatively impact the performance of automated metrics. We propose two automated metrics that incorporate additional information beyond the single translation. COMET-polycand uses alternative translations of the same source sentence to compare and contrast with the translation at hand, thereby providing a more informed assessment of its quality. COMET-polyic, inspired by retrieval-based in-context learning, takes in translations of similar source texts along with their human-labeled quality scores to guide the evaluation. We find that including a single additional translation in COMET-polycand improves the segment-level metric performance (0.079 to 0.118 Kendall's tau-b correlation), with further gains when more translations are added. Incorporating retrieved examples in COMET-polyic yields similar improvements (0.079 to 0.116 Kendall's tau-b correlation). We release our models publicly.
Evaluating the Impact of Verbal Multiword Expressions on Machine Translation
Liu, Linfeng, Ghosh, Saptarshi, Jiang, Tianyu
Verbal multiword expressions (VMWEs) present significant challenges for natural language processing due to their complex and often non-compositional nature. While machine translation models have seen significant improvement with the advent of language models in recent years, accurately translating these complex linguistic structures remains an open problem. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and sentences containing these language phenomena extracted from machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality. We also propose an LLM-based paraphrasing approach that replaces these expressions with their literal counterparts, demonstrating significant improvement in translation quality for verbal idioms and verb-particle constructions.
Confidence-Modulated Speculative Decoding for Large Language Models
Sen, Jaydip, Dasgupta, Subhasis, Waghela, Hetvi
-- Speculative decoding has emerged as an effective approach for accelerating autoregressive inference by parallelizing token generation through a draft - then - verify paradigm. However, existing methods rely on static drafting lengths and rigid verification cri teria, limiting their adaptability across varying model uncertainties and input complexities. This paper proposes an information - theoretic framework for speculative decoding based on confidence - modulated drafting. By leveraging entropy and margin - based uncertainty measures over the drafter's output distribution, the proposed method dynamically adjusts the number of speculatively generated tokens at each iteration. This adaptive mechanism reduces rollback frequency, improves resource utilization, an d maintains output fidelity. Additionally, the verification process is modulated using the same confidence signals, enabling more flexible acceptance of drafted tokens without sacrificing generation quality. Experiments on machine translation and summariza tion tasks demonstrate significant speedups over standard speculative decoding while preserving or improving BLEU and ROUGE scores. The proposed approach offers a principled, plug - in method for efficient and robust decoding in large language models under v arying conditions of uncertainty. Keywords -- Speculative Decoding, Autoregressive Models, Confidence Estimation, Adaptive Inference, Entropy - Based Drafting, Sequence Generation, Large Language Models, Large Language Models (LLMs), Information - Theoretic Decoding. The task of sequence generation lies at the heart of numerous applications in natural language processing, including machine translation, text summarization, dialogue generation, and code synthesis. In the overwhelming majority of these applications, autor egressive (AR) decoding remains the dominant paradigm for generating sequences from a probabilistic language model [1 - 2] . Autoregressive models, particularly those based on the Transformer architecture, operate by predicting each token conditioned on the e ntire history of previously generated tokens. This left - to - right decoding strategy, though optimal in terms of likelihood estimation, suffers from a fundamental limitation: the inherently sequential nature of generation prohibits efficient parallelization, severely hindering inference throughput, especially in latency - sensitive deployment scenarios.
Preliminary Ranking of WMT25 General Machine Translation Systems
Kocmi, Tom, Avramidis, Eleftherios, Bawden, Rachel, Bojar, Ondřej, Dranch, Konstantin, Dvorkovich, Anton, Dukanov, Sergey, Fedorova, Natalia, Fishel, Mark, Freitag, Markus, Gowda, Thamme, Grundkiewicz, Roman, Haddow, Barry, Karpinska, Marzena, Koehn, Philipp, Lakougna, Howard, Lundin, Jessica, Murray, Kenton, Nagata, Masaaki, Perrella, Stefano, Proietti, Lorenzo, Popel, Martin, Popović, Maja, Riley, Parker, Shmatova, Mariya, Steingrímsson, Steinþór, Yankovskaya, Lisa, Zouhar, Vilém
We present the preliminary rankings of machine translation (MT) systems submitted to the WMT25 General Machine Translation Shared Task, as determined by automatic evaluation metrics. Because these rankings are derived from automatic evaluation, they may exhibit a bias toward systems that employ re-ranking techniques, such as Quality Estimation or Minimum Bayes Risk decoding. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The purpose of releasing these findings now is to assist task participants with their system description papers; not to provide final findings.
The Mediomatix Corpus: Parallel Data for Romansh Idioms via Comparable Schoolbooks
Hopton, Zachary, Vamvas, Jannis, Büchler, Andrin, Rutkiewicz, Anna, Cathomas, Rico, Sennrich, Rico
The five idioms (i.e., varieties) of the Romansh language are largely standardized and are taught in the schools of the respective communities in Switzerland. In this paper, we present the first parallel corpus of Romansh idioms. The corpus is based on 291 schoolbook volumes, which are comparable in content for the five idioms. We use automatic alignment methods to extract 207k multi-parallel segments from the books, with more than 2M tokens in total. A small-scale human evaluation confirms that the segments are highly parallel, making the dataset suitable for NLP applications such as machine translation between Romansh idioms. We release the parallel and unaligned versions of the dataset under a CC-BY-NC-SA license and demonstrate its utility for machine translation by training and evaluating an LLM on a sample of the dataset.