You are going to learn the most popular classification algorithm. Which is the Random forest algorithm. As a motivation to go further I am going to give you one of the best advantages of random forest. The Same algorithm both for classification and regression, You mind be thinking I am kidding. But the truth is, Yes we can use the same random forest algorithm both for classification and regression.
Five simple soft sensor methodologies with two update conditions were compared on two experimentally-obtained datasets and one simulated dataset. The soft sensors investigated were moving window partial least squares regression (and a recursive variant), moving window random forest regression, the mean moving window of $y$, and a novel random forest partial least squares regression ensemble (RF-PLS), all of which can be used with small sample sizes so that they can be rapidly placed online. It was found that, on two of the datasets studied, small window sizes led to the lowest prediction errors for all of the moving window methods studied. On the majority of datasets studied, the RF-PLS calibration method offered the lowest one-step-ahead prediction errors compared to those of the other methods, and it demonstrated greater predictive stability at larger time delays than moving window PLS alone. It was found that both the random forest and RF-PLS methods most adequately modeled the datasets that did not feature purely monotonic increases in property values, but that both methods performed more poorly than moving window PLS models on one dataset with purely monotonic property values. Other data dependent findings are presented and discussed.
Having tried logistic regression the first time around, I moved on to decision trees and KNN. But unfortunately, those models performed horribly and had to be scrapped. Random Forest seemed to be the buzz word around the Kaggle forums, so I obviously had to try it out next. I took a couple of days to read up on it, worked out a few examples on my own before re-taking a stab at the titanic dataset. The'caret' package is a beauty.
The Jupyter Notebook can be found here. There is no template for solving a data science problem. But we do see similar steps in many different projects. I wanted to make a clean workflow to serve as an example to aspiring data scientists. I also wanted to give people working with data scientists an easy to understand guide to data science. This is a high-level overview and every step (and almost every sentence) in this overview can be addressed on its own. Many books like Introduction to Statistical Learning by Hastie and Tibshirani and many courses like Andrew Ng's Machine Learning course at Stanford, go into these topics in more detail. The data science community is full of great literature and great resources.
If you are not aware of the concepts of decision tree classifier, Please spend some time on the below articles, As you need to know how the Decision tree classifier works before you learning the working nature of the random forest algorithm. Given the training dataset with targets and features, the decision tree algorithm will come up with some set of rules. In decision tree algorithm calculating these nodes and forming the rules will happen using the information gain and gini index calculations. In random forest algorithm, Instead of using information gain or gini index for calculating the root node, the process of finding the root node and splitting the feature nodes will happen randomly.