Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hong, Hanhua, Xiao, Chenghao, Wang, Yang, Liu, Yiqi, Rong, Wenge, Lin, Chenghua
–arXiv.org Artificial Intelligence
Evaluating natural language generation systems is challenging due to the diversity of valid outputs. While human evaluation is the gold standard, it suffers from inconsistencies, lack of standardisation, and demographic biases, limiting reproducibility. LLM-based evaluators offer a scalable alternative but are highly sensitive to prompt design, where small variations can lead to significant discrepancies. In this work, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Our method requires only a single evaluation sample and eliminates the need for time-consuming manual prompt engineering, thereby improving both efficiency and robustness. Our work contributes toward a new direction for more robust and efficient LLM-based evaluation.
arXiv.org Artificial Intelligence
Sep-11-2025
- Country:
- Asia
- China > Ningxia Hui Autonomous Region
- Yinchuan (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- China > Ningxia Hui Autonomous Region
- Europe
- Austria > Vienna (0.14)
- Finland > Pirkanmaa
- Tampere (0.04)
- Spain (0.04)
- United Kingdom > England
- South Yorkshire (0.04)
- North America > United States
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Florida > Miami-Dade County
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Overview (1.00)
- Research Report > New Finding (1.00)
- Industry:
- Law (0.46)
- Technology: