Generating Diverse Translation with Perturbed kNN-MT

Nishida, Yuto, Morishita, Makoto, Kamigaito, Hidetaka, Watanabe, Taro

Feb-14-2024–arXiv.org Artificial Intelligence

Generating multiple translation candidates would enable users to choose the one that satisfies their needs. Although there has been work on diversified generation, there exists room for improving the diversity mainly because the previous methods do not address the overcorrection problem -- the model underestimates a prediction that is largely different from the training data, even if that prediction is likely. This paper proposes methods that generate more diverse translations by introducing perturbed k-nearest neighbor machine translation (kNN-MT). Our methods expand the search space of kNN-MT and help incorporate diverse words into candidates by addressing the overcorrection problem. Our experiments show that the proposed methods drastically improve candidate diversity and control the degree of diversity by tuning the perturbation's magnitude.

computational linguistic, spring summer collection, translation, (10 more...)

arXiv.org Artificial Intelligence

Feb-14-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania (0.04)
    - Maryland > Baltimore (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Spain (0.04)
  - Slovenia (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Trentino-Alto Adige/Südtirol > Trentino Province
      - Trento (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Machine Translation (1.00)
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.55)