Kaggle Ensembling Guide

Apr-5-2016, 09:32:37 GMT–#artificialintelligence

Model ensembling is a very powerful technique to increase accuracy on a variety of ML tasks. In this article I will share my ensembling approaches for Kaggle Competitions. For the first part we look at creating ensembles from submission files. The second part will look at creating ensembles through stacked generalization/blending. I answer why ensembling reduces the generalization error. Finally I show different methods of ensembling, together with their results and code to try it out for yourself. This is how you win ML competitions: you take other peoples' work and ensemble them together." The most basic and convenient way to ensemble is to ensemble Kaggle submission CSV files. You only need the predictions on the test set for these methods -- no need to retrain a model. This makes it a quick way to ensemble already existing model predictions, ideal when teaming up. Let's see why model ensembling reduces error rate and why it works better to ensemble low-correlated model predictions. During space missions it is very important that all signals are correctly relayed. A coding solution was found in error correcting codes. The simplest error correcting code is a repetition-code: Relay the signal multiple times in equally sized chunks and have a majority vote. Signal corruption is a very rare occurrence and often occur in small bursts. So then it figures that it is even rarer to have a corrupted majority vote. As long as the corruption is not completely unpredictable (has a 50% chance of occurring) then signals can be repaired. Suppose we have a test set of 10 samples. The ground truth is all positive ("1?):

artificial intelligence, machine learning, prediction, (18 more...)

#artificialintelligence

Apr-5-2016, 09:32:37 GMT

News Web Page

Add feedback

Country:
- North America > United States > Hawaii (0.04)

Genre:
- Contests & Prizes (0.34)

Industry:
- Media > Film (0.47)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Performance Analysis > Accuracy (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found