Predicting movie revenue with AdaBoost, XGBoost and LightGBM


Marvel's Avengers: Endgame recently dethroned Avatar as the highest grossing movie in history and while there was no doubt about this movie becoming very successful, I want to understand what makes any given movie a success. I am using data from The Movie Database provided through kaggle. The data set is split into a train and test set with the train set containing 3,000 movies and the test set comprising 4,398. The train data set also contains the target variable revenue. Prequels and Sequels: Maybe unsurprisingly, movies that are either prequels or sequels to related movies earn on average a higher revenue than standalone movies.

When you want to purchase a new car, will you walk up to the first car shop and purchase one based on the advice of the dealer? You would likely browser a few web portals where people have posted their reviews and compare different car models, checking for their features and prices. You will also probably ask your friends and colleagues for their opinion. In short, you wouldn't directly reach a conclusion, but will instead make a decision considering the opinions of other people as well. Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance.