OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Kartáč, Ivan, Lango, Mateusz, Dušek, Ondřej
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. In this paper, we introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.
arXiv.org Artificial Intelligence
Mar-14-2025
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California > San Francisco County
- San Francisco (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Europe
- Netherlands (0.04)
- Monaco (0.04)
- Czechia > Prague (0.04)
- Spain
- Galicia > Madrid (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Finland > Pirkanmaa
- Tampere (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- Singapore (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- China
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (0.45)
- Transportation
- Infrastructure & Services (0.46)
- Air (0.46)
- Technology: