Goto

Collaborating Authors

 Machine Translation


Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation

arXiv.org Artificial Intelligence

The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While some recent repository-level translation benchmarks attempt to address these challenges, they still face limitations, including poor maintainability and overly coarse evaluation granularity, which make them less developer-friendly. We introduce Skeleton-Guided-Translation, a framework for repository-level Java to C# code translation with fine-grained quality evaluation. It uses a two-step process: first translating the repository's structural "skeletons", then translating the full repository guided by these skeletons. Building on this, we present TRANSREPO-BENCH, a benchmark of high quality open-source Java repositories and their corresponding C# skeletons, including matching unit tests and build configurations. Our unit tests are fixed and can be applied across multiple or incremental translations without manual adjustments, enhancing automation and scalability in evaluations. Additionally, we develop fine-grained evaluation metrics that assess translation quality at the individual test case level, addressing traditional binary metrics' inability to distinguish when build failures cause all tests to fail. Evaluations using TRANSREPO-BENCH highlight key challenges and advance more accurate repository level code translation.


DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models

arXiv.org Artificial Intelligence

Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectical variation and cross-lingual generalization to the HRL dialect continuum. We present DialUp, consisting of a training-time technique for adapting a pretrained model to dialectical data (M->D), and an inference-time intervention adapting dialectical data to the model expertise (D->M). M->D induces model robustness to potentially unseen and unknown dialects by exposure to synthetic data exemplifying linguistic mechanisms of dialectical variation, whereas D->M treats dialectical divergence for known target dialects. These methods show considerable performance gains for several dialects from four language families, and modest gains for two other language families. We also conduct feature and error analyses, which show that language varieties with low baseline MT performance are more likely to benefit from these approaches.


A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have become state-of-the-art in Machine Translation (MT), often trained on massive bilingual parallel corpora scraped from the web, that contain low-quality entries and redundant information, leading to significant computational challenges. Various data filtering methods exist to reduce dataset sizes, but their effectiveness largely varies based on specific language pairs and domains. This paper evaluates the impact of commonly used data filtering techniques, such as LASER, MUSE, and LaBSE, on English-Polish translation within the biomedical domain. By filtering the UFAL Medical Corpus, we created varying dataset sizes to fine-tune the mBART50 model, which was then evaluated using the SacreBLEU metric on the Khresmoi dataset, having the quality of translations assessed by bilingual speakers. Our results show that both LASER and MUSE can significantly reduce dataset sizes while maintaining or even enhancing performance. We recommend the use of LASER, as it consistently outperforms the other methods and provides the most fluent and natural-sounding translations.


Evaluation of NMT-Assisted Grammar Transfer for a Multi-Language Configurable Data-to-Text System

arXiv.org Artificial Intelligence

One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. These configurations are then used by a surface realizer and in document planning stages to generate output. In this paper, we describe a rule-based NLG implementation of this approach where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review, and introduce a cross-language grammar dependency model to create a multilingual NLG system that generates text from the source data, scaling the generation phase without a human in the loop. Additionally, we introduce a method for human post-editing evaluation on the automatically translated text. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.


AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown impressive multilingual capabilities through pretraining on diverse corpora. While these models show strong reasoning abilities, their performance varies significantly across languages due to uneven training data distribution. Existing approaches using machine translation, and extensive multilingual pretraining and cross-lingual tuning face scalability challenges and often fail to capture nuanced reasoning processes across languages. In this paper, we introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning by dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses. AdaCoT leverages a language-agnostic core and incorporates an adaptive, reward-based mechanism for selecting optimal reasoning pathways without requiring additional pretraining. Our comprehensive evaluation across multiple benchmarks demonstrates substantial improvements in both factual reasoning quality and cross-lingual consistency, with particularly strong performance gains in low-resource language settings. The results suggest that adaptive reasoning paths can effectively bridge the performance gap between high and low-resource languages while maintaining cultural and linguistic nuances.


Review for NeurIPS paper: TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing Systems

Weaknesses: W1 The submission claims that existing approaches only capture spatial appearance (line 42), but the one that is compared with [2] is actually based on RNNs, that have the potential to capture motion information across a sequence of frames. W2 While the work acknowledges the challenges of of motion blurs and fine-grained gesture details (line 40), it does not address them in the proposed approach. W3 The quantitative gains in terms of BLEU (9.58 to 13.41) and ROUGE (31.80 to 34.96) scores are not outstanding. W4 The results of [2] by exploiting the glosses available in the dataset are better than the ones in this submission. Given that the contributions of the work address the visual representation, it is not argues why the proposed techniques are also assess with the Sign-to-Gloss-to-Text set up considered in [2].


Review for NeurIPS paper: TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing Systems

The reviewers were positive about the ideas in the paper and mostly debated the merits of the evaluation. For one they were not fully convinced about the arguments in the rebuttal about the differences between the sharpness of boundaries for action localization and sign language translation. For camera ready I would suggest better addressing this point, as well as comparing or justifying differences to "Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation", Camgoz et al, CVPR 2020. One final suggestion is to add results with one more video encoder in addition to I3D.


Reviews: Large Memory Layers with Product Keys

Neural Information Processing Systems

UPDATE: Authors answered my questions, I would like to keep my score unchanged and suggest to focus on clarity of the final version. Perhaps, this is the case when I would really be interested in looking at the source code. Originality: the paper borrows the general idea of product keys from the database community, however the application to fast retrieval in neural memory systems seems quite novel to me. Quality: The core ideas of the paper are sound, however more I would appreciate more rigor in both conceptual and experimental comparison with other approaches incorporating memory to Transformer (see e.g. Another suggestion would be to discuss more the issue of potential non-uniformity of the query distribution, which indeed seems to be quite relevant.


Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly utilized for machine translation, yet their predictions often exhibit uncertainties that hinder interpretability and user trust. Effectively visualizing these uncertainties can enhance the usability of LLM outputs, particularly in contexts where translation accuracy is critical. This paper addresses two primary objectives: (1) providing users with token-level insights into model confidence and (2) developing a web-based visualization tool to quantify and represent translation uncertainties. To achieve these goals, we utilized the T5 model with the WMT19 dataset for translation tasks and evaluated translation quality using established metrics such as BLEU, METEOR, and ROUGE. We introduced three novel uncertainty quantification (UQ) metrics: (1) the geometric mean of token probabilities, (2) the arithmetic mean of token probabilities, and (3) the arithmetic mean of the kurtosis of token distributions. These metrics provide a simple yet effective framework for evaluating translation performance. Our analysis revealed a linear relationship between the traditional evaluation metrics and our UQ metrics, demonstrating the validity of our approach. Additionally, we developed an interactive web-based visualization that uses a color gradient to represent token confidence. This tool offers users a clear and intuitive understanding of translation quality while providing valuable insights into model performance. Overall, we show that our UQ metrics and visualization are both robust and interpretable, offering practical tools for evaluating and accessing machine translation systems.


Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets

arXiv.org Artificial Intelligence

This study introduces an approach to Estonian text simplification using two model architectures: a neural machine translation model and a fine-tuned large language model (LLaMA). Given the limited resources for Estonian, we developed a new dataset, the Estonian Simplification Dataset, combining translated data and GPT-4.0-generated simplifications. We benchmarked OpenNMT, a neural machine translation model that frames text simplification as a translation task, and fine-tuned the LLaMA model on our dataset to tailor it specifically for Estonian simplification. Manual evaluations on the test set show that the LLaMA model consistently outperforms OpenNMT in readability, grammaticality, and meaning preservation. These findings underscore the potential of large language models for low-resource languages and provide a basis for further research in Estonian text simplification.