Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
García-Romero, Cristian, Esplà-Gomis, Miquel, Sánchez-Martínez, Felipe
–arXiv.org Artificial Intelligence
Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.
arXiv.org Artificial Intelligence
Nov-6-2025
- Country:
- Asia
- China
- Middle East > Saudi Arabia
- Asir Province > Abha (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- North America
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Florida > Miami-Dade County
- Asia
- Genre:
- Research Report
- New Finding (1.00)
- Promising Solution (0.86)
- Research Report
- Technology: