Evaluating LLM Story Generation through Large-scale Network Analysis of Social Structures
–arXiv.org Artificial Intelligence
Evaluating the creative capabilities of large language models (LLMs) in complex tasks often requires human assessments that are difficult to scale. We introduce a novel, scalable methodology for evaluating LLM story generation by analyzing underlying social structures in narratives as signed character networks. To demonstrate its effectiveness, we conduct a large-scale comparative analysis using networks from over 1,200 stories, generated by four leading LLMs (GPT-4o, GPT-4o mini, Gemini 1.5 Pro, and Gemini 1.5 Flash) and a human-written corpus. Our findings, based on network properties like density, clustering, and signed edge weights, show that LLM-generated stories consistently exhibit a strong bias toward tightly-knit, positive relationships, which aligns with findings from prior research using human assessment. Our proposed approach provides a valuable tool for evaluating limitations and tendencies in the creative storytelling of current and future LLMs.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Europe
- France (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Sweden
- Uppsala County > Uppsala (0.04)
- Vaestra Goetaland > Gothenburg (0.04)
- North America > United States
- Florida > Miami-Dade County
- Miami (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Florida > Miami-Dade County
- Europe
- Genre:
- Research Report > New Finding (1.00)
- Technology: