BookWorm: A Dataset for Character Description and Analysis
Papoudakis, Argyrios, Lapata, Mirella, Keller, Frank
–arXiv.org Artificial Intelligence
Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using coreference-based retrieval produce the most factual descriptions, as measured by fact- and entailment-based metrics. We hope our dataset, experiments, and analysis will inspire further research in character-based narrative understanding.
arXiv.org Artificial Intelligence
Oct-14-2024
- Country:
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- China > Hong Kong (0.04)
- Middle East > Saudi Arabia
- Asir Province > Abha (0.04)
- Singapore (0.04)
- Europe
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Greece (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Middle East > Malta (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom (0.14)
- Bulgaria > Sofia City Province
- North America
- Cuba (0.04)
- Dominican Republic (0.04)
- United States
- California > Los Angeles County
- Los Angeles (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Montgomery County
- Gaithersburg (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas (0.04)
- California > Los Angeles County
- South America > Chile
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.92)
- Technology: