BookSum: A Collection of Datasets for Long-form Narrative Summarization
Kryściński, Wojciech, Rajani, Nazneen, Agarwal, Divyansh, Xiong, Caiming, Radev, Dragomir
–arXiv.org Artificial Intelligence
The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.
arXiv.org Artificial Intelligence
Dec-6-2022
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia > China
- Hong Kong (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Czechia > Prague (0.04)
- Germany > Berlin (0.04)
- Italy
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Tuscany > Florence (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > British Columbia
- United States
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York (0.04)
- Alaska > Anchorage Municipality
- South America > Chile
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Health & Medicine (0.46)
- Law (0.67)
- Technology: