Random Forests and the Bias-Variance Tradeoff – Towards Data Science
The Random Forest is an extremely popular machine learning algorithm. Often, with not too much pre-processing, one can throw together a quick and dirty model with no hyperparameter tuning and acheive results that aren't awful. As an example, I put together a RandomForestRegressor in Python using scikit-learn for the New York City Taxi Fare Prediction playground competition on Kaggle recently, passing in no arguments to the model constructor and using 1/100 for the training data (554238 of 55M rows), for a validation R² of 0.8. NOTE: This snippet assumes you split the data into training and validation sets with your features and target variable separated. You can see the full code on my GitHub profile.
Oct-11-2018, 14:19:33 GMT