DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely

Bao, Forrest Sheng, Tu, Ruixuan, Luo, Ge, Yang, Yinfei, Li, Hebi, Qiu, Minghui, He, Youbiao, Chen, Cen

Nov-26-2023–arXiv.org Artificial Intelligence

Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system summary against its corresponding reference can be effectively adapted to assess it against its source document, thereby transforming these metrics into reference-free ones. Experimental results support this hypothesis. After being repurposed reference-freely, the zero-shot BERTScore using the pretrained DeBERTa-large-MNLI model of <0.5B parameters consistently outperforms its original reference-based version across various aspects on the SummEval and Newsroom datasets. It also excels in comparison to most existing reference-free metrics and closely competes with zero-shot summary evaluators based on GPT-3.5.

bertscore, computational linguistic, metric, (16 more...)

arXiv.org Artificial Intelligence

Nov-26-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.14)
  - Washington > King County
    - Seattle (0.04)
  - Iowa > Story County
    - Ames (0.04)
  - California > Santa Clara County
    - Sunnyvale (0.04)
- Europe
  - Ukraine (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)
  - China
    - Hong Kong (0.04)
    - Shanghai > Shanghai (0.04)

Genre:
- Research Report (0.65)
- Overview (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)