EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation

Fan, Yuchen, Zhong, Xin, Wang, Chengsi, Wu, Gaoche, Zhou, Bowen

Jul-6-2024–arXiv.org Artificial Intelligence

Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on similarity and fail to consider informativeness or LLM-based metrics, lacking quantitative analysis of information richness and are rather subjective. In this paper, we propose a new evaluation metric called EVA-Score using Atomic Fact Chain Generation and Document-level Relation Extraction together to automatically calculate the informativeness and give a definite number as an information score. Experiment results show that our metric shows a state-of-the-art correlation with humans. We also re-evaluate the performance of LLMs on long-form summarization comprehensively from the information aspect, forecasting future ways to use LLMs for long-form summarization.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-6-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Romania (0.30)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Energy (0.68)
- Health & Medicine
  - Diagnostic Medicine (1.00)
  - Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found