Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

Sep-26-2022–arXiv.org Artificial Intelligence

Neural metrics have achieved impressive correlation with human judgements in the evaluation of machine translation systems, but before we can safely optimise towards such metrics, we should be aware of (and ideally eliminate) biases toward bad translations that receive high scores. Our experiments show that sample-based Minimum Bayes Risk decoding can be used to explore and quantify such weaknesses. When applying this strategy to COMET for en-de and de-en, we find that COMET models are not sensitive enough to discrepancies in numbers and named entities. We further show that these biases are hard to fully remove by simply training on additional synthetic data and release our code and data for facilitating further experiments.

machine learning, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

Sep-26-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania (0.04)
    - Michigan (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > San Diego County
      - San Diego (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Italy (0.04)
  - Switzerland > Zürich
    - Zürich (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - India > Maharashtra
    - Mumbai (0.04)
- Africa > Middle East
  - Algeria (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Machine Learning > Performance Analysis
    - Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found