Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?

Briva-Iglesias, Vicent, Camargo, Joao Lucas Cavalheiro, Dogru, Gokhan

Feb-12-2024–arXiv.org Artificial Intelligence

This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a tradition-al neural machine translation (NMT) system across four language pairs in the legal domain. It combines automatic evaluation met-rics (AEMs) and human evaluation (HE) by professional transla-tors to assess translation ranking, fluency and adequacy. The re-sults indicate that while Google Translate generally outperforms LLMs in AEMs, human evaluators rate LLMs, especially GPT-4, comparably or slightly better in terms of producing contextually adequate and fluent translations. This discrepancy suggests LLMs' potential in handling specialized legal terminology and context, highlighting the importance of human evaluation methods in assessing MT quality. The study underscores the evolving capabil-ities of LLMs in specialized domains and calls for reevaluation of traditional AEMs to better capture the nuances of LLM-generated translations.

gpt-4, machine translation, translation, (14 more...)

arXiv.org Artificial Intelligence

Feb-12-2024

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Paraná (0.04)
- North America > United States
  - New York > Monroe County
    - Rochester (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
- Europe
  - Spain (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Netherlands > South Holland
    - The Hague (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Finland > Pirkanmaa
    - Tampere (0.04)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate
    - Abu Dhabi (0.04)
  - Republic of Türkiye > Istanbul Province
    - Istanbul (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Law (1.00)
- Education > Educational Setting (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Machine Translation (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)