Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
García-Romero, Cristian, Esplà-Gomis, Miquel, Sánchez-Martínez, Felipe
–arXiv.org Artificial Intelligence
Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.
arXiv.org Artificial Intelligence
Nov-6-2025
- Country:
- Europe (1.00)
- Asia (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report
- New Finding (1.00)
- Promising Solution (0.86)
- Research Report
- Technology: