QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Zhang, Shiyue, Wan, David, Cattan, Arie, Klein, Ayal, Dagan, Ido, Bansal, Mohit
–arXiv.org Artificial Intelligence
How to properly conduct human evaluations for text summarization is a longstanding challenge. The Pyramid human evaluation protocol, which assesses content selection by breaking the reference summary into sub-units and verifying their presence in the system summary, has been widely adopted. However, it suffers from a lack of systematicity in the definition and granularity of the sub-units. We address these problems by proposing QAPyramid, which decomposes each reference summary into finer-grained question-answer (QA) pairs according to the QA-SRL framework. We collect QA-SRL annotations for reference summaries from CNN/DM and evaluate 10 summarization systems, resulting in 8.9K QA-level annotations. We show that, compared to Pyramid, QAPyramid provides more systematic and fine-grained content selection evaluation while maintaining high inter-annotator agreement without needing expert annotations. Furthermore, we propose metrics that automate the evaluation pipeline and achieve higher correlations with QAPyramid than other widely adopted metrics, allowing future work to accurately and efficiently benchmark summarization systems.
arXiv.org Artificial Intelligence
Dec-9-2024
- Country:
- Africa (0.04)
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- New York (0.04)
- Washington > King County
- Seattle (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Monaco (0.04)
- Belgium (0.04)
- France > Grand Est
- Meurthe-et-Moselle > Nancy (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- North Rhine-Westphalia > Düsseldorf Region
- Mönchengladbach (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- North Rhine-Westphalia > Düsseldorf Region
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Jordan (0.04)
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Sports
- Soccer (0.68)
- Health & Medicine > Therapeutic Area
- Immunology (0.48)
- Leisure & Entertainment > Sports
- Technology: