When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

Sindhujan, Archchana, Kanojia, Diptesh, Orasan, Constantin, Qian, Shenbin

Jan-8-2025–arXiv.org Artificial Intelligence

This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score (0-100) to the translated output. We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios and perform instruction fine-tuning using a novel prompt based on annotation guidelines. Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models. Our error analysis reveals tokenization issues, along with errors due to transliteration and named entities, and argues for refinement in LLM pre-training for cross-lingual tasks. We release the data, and models trained publicly for further research.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - India (0.14)
  - Middle East (0.14)
  - Thailand (0.14)
- Europe
  - Bulgaria (0.14)
  - Croatia (0.14)
  - Finland (0.14)
  - Spain (0.14)
  - United Kingdom (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)