News Summarization and Evaluation in the Era of GPT-3
Goyal, Tanya, Li, Junyi Jessy, Durrett, Greg
–arXiv.org Artificial Intelligence
The recent success of prompting large language models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. First, we investigate how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality. Next, we study what this means for evaluation, particularly the role of gold standard test sets. Our experiments show that both reference-based and reference-free automatic metrics cannot reliably evaluate GPT-3 summaries. Finally, we evaluate models on a setting beyond generic summarization, specifically keyword-based summarization, and show how dominant fine-tuning approaches compare to prompting. To support further research, we release: (a) a corpus of 10K generated summaries from fine-tuned and prompt-based models across 4 standard summarization benchmarks, (b) 1K human preference judgments comparing different systems for generic- and keyword-based summarization.
arXiv.org Artificial Intelligence
May-23-2023
- Country:
- Africa (0.28)
- Asia
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Russia (0.68)
- Middle East
- Atlantic Ocean > Black Sea (0.04)
- Europe
- Germany (0.04)
- Latvia (0.04)
- Netherlands (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.05)
- Ukraine
- Kyiv Oblast > Kyiv (0.05)
- Luhansk Oblast > Luhansk (0.04)
- United Kingdom
- England
- Durham (0.04)
- Herefordshire (0.04)
- Northern Ireland > County Tyrone (0.04)
- England
- North America
- Canada
- Ontario > Toronto (0.04)
- Prince Edward Island > Queens County
- Charlottetown (0.04)
- United States
- California > San Francisco County
- San Francisco (0.04)
- Florida > Palm Beach County
- Palm Beach (0.04)
- Louisiana
- East Baton Rouge Parish > Baton Rouge (0.04)
- Orleans Parish > New Orleans (0.04)
- Maryland > Montgomery County
- Bethesda (0.04)
- Missouri > Jackson County
- Kansas City (0.14)
- New York
- Erie County > Buffalo (0.04)
- Richmond County > New York City (0.04)
- Texas
- Travis County > Austin (0.04)
- Uvalde County > Uvalde (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Francisco County
- Canada
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Banking & Finance (1.00)
- Consumer Products & Services (0.67)
- Education (1.00)
- Government
- Military (1.00)
- Regional Government
- Europe Government (0.68)
- North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Media (0.68)
- Technology: