What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
–arXiv.org Artificial Intelligence
With the development of artificial intelligence, particularly the success of Large Language Models (LLMs), the quantity and quality of automatically generated stories have significantly increased. This has led to the need for automatic story evaluation to assess the generative capabilities of computing systems and analyze the quality of both automatic-generated and human-written stories. Evaluating a story can be more challenging than other generation evaluation tasks. While tasks like machine translation primarily focus on assessing the aspects of fluency and accuracy, story evaluation demands complex additional measures such as overall coherence, character development, interestingness, etc. This requires a thorough review of relevant research. In this survey, we first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual. We highlight their evaluation challenges, identify various human criteria to measure stories, and present existing benchmark datasets. Then, we propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation. We also provide descriptions of these metrics, along with the discussion of their merits and limitations. Later, we discuss the human-AI collaboration for story evaluation and generation. Finally, we suggest potential future research directions, extending from story evaluation to general evaluations.
arXiv.org Artificial Intelligence
Aug-26-2024
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- Nevada > Clark County
- Las Vegas (0.04)
- Arizona > Maricopa County
- Scottsdale (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Washington > King County
- California
- San Diego County > San Diego (0.04)
- Los Angeles County > Long Beach (0.04)
- New York > New York County
- New York City (0.04)
- Michigan > Washtenaw County
- Puerto Rico > San Juan
- San Juan (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario
- Toronto (0.04)
- National Capital Region > Ottawa (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.14)
- Europe
- Czechia > Prague (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Germany
- Hamburg (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Portugal > Lisbon
- Lisbon (0.14)
- France
- Provence-Alpes-Côte d'Azur > Alpes-Maritimes
- Nice (0.04)
- Hauts-de-France > Nord
- Lille (0.04)
- Provence-Alpes-Côte d'Azur > Alpes-Maritimes
- Italy
- Tuscany > Florence (0.04)
- Piedmont > Turin Province
- Turin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Indonesia > Bali (0.04)
- Singapore (0.04)
- India (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.14)
- Kansai > Kyoto Prefecture
- Kyoto (0.04)
- Kantō > Tokyo Metropolis Prefecture
- China
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- South America > Colombia
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.45)
- Industry:
- Media (1.00)
- Leisure & Entertainment (0.67)
- Technology: