MENLI: Robust Evaluation Metrics from Natural Language Inference

Dec-25-2023–arXiv.org Artificial Intelligence

Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).

computational linguistic, evaluation, metric, (16 more...)

arXiv.org Artificial Intelligence

Dec-25-2023

arXiv.org PDF

Add feedback

Country:
- Africa > Mali (0.04)
- North America
  - Dominican Republic (0.04)
  - Canada (0.04)
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - Russia (0.04)
  - Ukraine (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - United Kingdom > England
    - Greater London > London > Wimbledon (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Germany > Hesse
    - Darmstadt Region > Darmstadt (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Russia (0.04)
  - China > Hong Kong (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Information Technology > Security & Privacy (0.54)
- Government
  - Military (0.54)
  - Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Text Processing (0.88)
  - Machine Translation (0.69)