Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts

Hong, Hanhua, Xiao, Chenghao, Wang, Yang, Liu, Yiqi, Rong, Wenge, Lin, Chenghua

Sep-11-2025–arXiv.org Artificial Intelligence

Evaluating natural language generation systems is challenging due to the diversity of valid outputs. While human evaluation is the gold standard, it suffers from inconsistencies, lack of standardisation, and demographic biases, limiting reproducibility. LLM-based evaluators offer a scalable alternative but are highly sensitive to prompt design, where small variations can lead to significant discrepancies. In this work, we propose an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. Our method requires only a single evaluation sample and eliminates the need for time-consuming manual prompt engineering, thereby improving both efficiency and robustness. Our work contributes toward a new direction for more robust and efficient LLM-based evaluation.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

Sep-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States (0.46)
- Europe > Austria (0.28)

Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)

Industry:
- Law (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found