This tutorial is part of the Machine learning for developers learning path. In this tutorial, we describe the basics of solving a classification-based machine learning problem, and give you a comparative study of some of the current most popular algorithms. In the open Notebook, click Run to run the cells one at a time. The rest of the tutorial follows the order of the Notebook. Classification is when the feature to be predicted contains categories of values.
This post originally appeared on the Yhat blog. Yhat is a Brooklyn based company whose goal is to make data science applicable for developers, data scientists, and businesses alike. Yhat provides a software platform for deploying and managing predictive algorithms as REST APIs, while eliminating the painful engineering obstacles associated with production environments like testing, versioning, scaling and security. It can be used to on customer acquisition, retention, and churn or to in patients. Random forest is capable of regression and classification. It can handle a large number of features, and it's helpful for estimating which of your variables are important in the underlying data being modeled.
Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. It can be used to model the impact of marketing on customer acquisition, retention, and churn or to predict disease risk and susceptibility in patients. Random forest is capable of regression and classification. It can handle a large number of features, and it's helpful for estimating which of your variables are important in the underlying data being modeled. Random forest is solid choice for nearly any prediction problem (even non-linear ones).
For data scientists, a key part of interpreting machine learning models is understanding which factors impact predictions. In order to effectively use machine learning in their decision-making processes, companies need to know which factors are most important. For example, if a company wants to predict the likelihood of customer churn, it might also want to know what exactly drives a customer to leave a company. In this example, the model might indicate that customers who purchase products that rarely go on sale are much more likely to stop purchasing. Armed with this knowledge, a company can make smarter pricing decisions in the future.