Let's come straight to the point on this one – there are only 2 types of variables you see – Continuous and Discrete. Further, discrete variables can divided into Nominal (categorical) and Ordinal. We did a post on how to handle categorical variables last week, so you would expect a similar post on continuous variable. Yes, you are right – In this article, we will explain all possible ways for a beginner to handle continuous variables while doing machine learning or statistical modeling. But, before we actually start, first things first. Simply put, if a variable can take any value between its minimum and maximum value, then it is called a continuous variable.

Quantum computing is round the corner and uses in machine learning are already being hypothesised [2]. Big Data is involved in virtually every industry, from health care to finance. Each industry and task requires different approaches to how data is utilised. One thing that remains consistent is the increasing volume, velocity and variety of data (the 3V's of Big Data) involved and the challenge to put as much data to use for real time, effective solutions [3]. The involvement of machine learning has undoubtedly revolutionised this process, leading to far more accurate models and solutions to challenges in more unique ways [4].

With increase in computational power, we can now choose algorithms which perform very intensive calculations. One such algorithm is "Random Forest", which we will discuss in this article. While the algorithm is very popular in various competitions (e.g. Before going any further, here is an example on the importance of choosing the best algorithm. Yesterday, I saw a movie called " Edge of tomorrow".

Bagging and Boosting are similar in that they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. So, let's start from the beginning: Ensemble is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multiclassifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem. The second group of multiclassifiers contain the hybrid methods. They use a set of learners too, but they can be trained using different learning techniques.

The best-known optimization clustering algorithm is k-means clustering. Unlike hierarchical clustering methods that require processing time proportional to the square or cube of the number of observations, the time required by the k-means algorithm is proportional to the number of observations. This means that k-means clustering can be used on larger data sets. In fact, k-means clustering is inappropriate for small ( 100 observations) data sets. If the data set is small, the k-means solution becomes sensitive to the order in which the observations appear (the order effect).

But suppose every piece is the same shape and is small enough to make images confusing at first look. You'd take a guess at how they fit, right? Data can be that way. Fortunately, analysts are finding many advanced ways to bring data together. One technique receiving attention these days is clustering, an unsupervised machine learning method that calculates how unlabeled data should be grouped.

DBSCAN is a different type of clustering algorithm with some unique advantages. As the name indicates, this method focuses more on the proximity and density of observations to form clusters. This is very different from KMeans, where an observation becomes a part of cluster represented by nearest centroid. DBSCAN clustering can identify outliers, observations which won't belong to any cluster. Since DBSCAN clustering identifies the number of clusters as well, it is very useful with unsupervised learning of the data when we don't know how many clusters could be there in the data.

DBSCAN is a different type of clustering algorithm with some unique advantages. As the name indicates, this method focuses more on the proximity and density of observations to form clusters. This is very different from KMeans, where an observation becomes a part of cluster represented by nearest centroid. DBSCAN clustering can identify outliers, observations which won't belong to any cluster. Since DBSCAN clustering identifies the number of clusters as well, it is very useful with unsupervised learning of the data when we don't know how many clusters could be there in the data.

At the heart of nearly every foaming news article starting with the words "AI knows ..." is some machine learning paper exploiting this basic realization. Generally, to train a machine learning model -- the thing that's eventually tasked with making predictions from previously unseen observations -- we take these giant matrices and add our own labels to them. These labels represent our ground truth or human-defined truth, what we know to be true (or what we say to be true; critics of the paper rightly pointed out that the "gay" images in both the training data set and the test data set were not representative of gay people) about the data that we already have. What the researchers wound up with is a dataset where each item consisted of 4,096 independent variables (extracted facial features, such as nose shape and grooming style) and one dependent variable (sexual orientation).

The system has a collection of interacting memory systems akin to the spychological concepts of short-term and long-term memory, which allows it to learn by observation and adapt based on the frequency of patterns and their relationships through time. The association matrix enables the prediction of future observations based on current and previous observations. Each time a selection is ade, there is a small chance that the observation will not be random but instead will be from a "pattern", a pair of observations representing some event to be learned. The sampling process is repeated 1000 times and processed by the system to generate an association matrix.