Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese

Liu, Yikang, Zhang, Wanyang, Wang, Yiming, Tang, Jialong, Zhang, Pei, Yang, Baosong, Huang, Fei, Wang, Rui, Hu, Hai

Sep-22-2025–arXiv.org Artificial Intelligence

Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure for translationese -- the translationese-index (T-index), computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use synthesized translations and translations in the wild to evaluate T-index's generalizability in cross-domain settings and its validity against human judgments. Our results show that T-index can generalize to unseen genres, authors, and language pairs. Moreover, T-index computed using two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can effectively capture translationese, as demonstrated by alignment with human pointwise ratings and pairwise judgments. Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics and can serve as a complementary metric in MT QE.

machine learning, natural language, translationese, (20 more...)

arXiv.org Artificial Intelligence

Sep-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (0.68)
- North America > United States (0.67)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found