Examining the arc of 100,000 stories: a tidy analysis

May-6-2017, 15:20:13 GMT–@machinelearnbot

I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page. This offers a great opportunity to analyze story structure quantitatively. In this post I'll do a simple analysis, examining what words tend to occur at particular points within a story, including words that characterize the beginning, middle, or end. As I usually do for text analysis, I'll be using the tidytext package Julia Silge and I developed last year.

artificial intelligence, natural language, tidy analysis, (15 more...)

@machinelearnbot

May-6-2017, 15:20:13 GMT

News Web Page

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.15)

Industry:
- Leisure & Entertainment (0.55)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence > Natural Language (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found