Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation
Ploeger, Esther, Lai, Huiyuan, van Noord, Rik, Toral, Antonio
–arXiv.org Artificial Intelligence
Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
arXiv.org Artificial Intelligence
Aug-30-2024
- Country:
- Asia
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Europe
- Denmark > North Jutland
- Aalborg (0.04)
- Portugal
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.04)
- Flanders > East Flanders
- Ghent (0.04)
- Finland > Pirkanmaa
- Tampere (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Netherlands (0.04)
- Spain (0.04)
- Italy > Tuscany
- Florence (0.04)
- Denmark > North Jutland
- North America
- Dominican Republic (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Oregon (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Asia
- Genre:
- Research Report (1.00)
- Technology: