DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely
Bao, Forrest Sheng, Tu, Ruixuan, Luo, Ge, Yang, Yinfei, Li, Hebi, Qiu, Minghui, He, Youbiao, Chen, Cen
–arXiv.org Artificial Intelligence
Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system summary against its corresponding reference can be effectively adapted to assess it against its source document, thereby transforming these metrics into reference-free ones. Experimental results support this hypothesis. After being repurposed reference-freely, the zero-shot BERTScore using the pretrained DeBERTa-large-MNLI model of <0.5B parameters consistently outperforms its original reference-based version across various aspects on the SummEval and Newsroom datasets. It also excels in comparison to most existing reference-free metrics and closely competes with zero-shot summary evaluators based on GPT-3.5.
arXiv.org Artificial Intelligence
Nov-26-2023
- Country:
- Asia (0.68)
- Europe (0.67)
- North America > United States
- Wisconsin > Dane County > Madison (0.14)
- Genre:
- Overview (0.46)
- Research Report (0.65)
- Technology: