Abstractive Summarization of Large Document Collections Using GPT
Liu, Sengjie, Healey, Christopher G.
–arXiv.org Artificial Intelligence
This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are
arXiv.org Artificial Intelligence
Oct-9-2023
- Country:
- Oceania > Australia (0.04)
- North America
- United States
- Wisconsin (0.04)
- District of Columbia > Washington (0.04)
- North Carolina > Wake County
- Raleigh (0.04)
- Colorado > Denver County
- Denver (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- Santa Clara County > Stanford (0.04)
- San Diego County > San Diego (0.04)
- New York > New York County
- New York City (0.04)
- Canada > British Columbia
- United States
- Europe
- Germany > Berlin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Asia
- Middle East > Jordan (0.04)
- India (0.04)
- China > Hong Kong (0.04)
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.48)
- Industry:
- Technology:
- Information Technology
- Information Management (1.00)
- Data Science > Data Mining (1.00)
- Communications > Social Media (1.00)
- Human Computer Interaction (0.93)
- Artificial Intelligence
- Representation & Reasoning (1.00)
- Cognitive Science (0.93)
- Natural Language
- Text Processing (1.00)
- Large Language Model (1.00)
- Chatbot (1.00)
- Information Extraction (0.95)
- Discourse & Dialogue (0.70)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.46)
- Information Technology