Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy

Mar-25-2025–arXiv.org Artificial Intelligence

Meta-evaluation of automatic evaluation metrics -- assessing evaluation metrics themselves -- is crucial for accurately benchmarking natural language processing systems and has implications for scientific inquiry, production model development, and policy enforcement. While existing approaches to metric meta-evaluation focus on general statements about the absolute and relative quality of metrics across arbitrary system outputs, in practice, metrics are applied in highly contextual settings, often measuring the performance for a highly constrained set of system outputs. For example, we may only be interested in evaluating a specific model or class of models. We introduce a method for contextual metric meta-evaluation by comparing the local metric accuracy of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts. This observed variation highlights the importance of adopting context-specific metric evaluations over global ones.

accuracy, artificial intelligence, natural language, (14 more...)

arXiv.org Artificial Intelligence

Mar-25-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.69)
  - Natural Language
    - Machine Translation (1.00)
    - Text Processing (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found