Goto

Collaborating Authors

 tidytuesday data


LASSO regression using tidymodels and #TidyTuesday data for The Office

#artificialintelligence

Our modeling goal here is to predict the IMDB ratings for episodes of The Office based on the other characteristics of the episodes in the #TidyTuesday dataset. There are two datasets, one with the ratings and one with information like director, writer, and which character spoke which line. The episode numbers and titles are not consistent between them, so we can use regular expressions to do a better job of matching the datasets up for joining. We are going to use several different kinds of features for modeling. First, let's find out how many times characters speak per episode.