This blog provides an overview of how to build a Machine Learning model with details on various aspects such as data pre-processing, splitting the training and testing data, regression/classification, and finally model evaluation. Machine Learning (ML) is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions. ML systems are trained rather than explicitly programmed. It provides efficient tools for data analysis, data pre-processing, model building, model evaluation, and much more. So in this blog we will implement various ML models with the help of Scikit learn(sk-learn), which is a simple open-source Machine Learning library.
It will also provide information about missing values or outliers if any. For more information and functions which you can use read beginner's guide to exploratory data analysis. Both missing values and outliers are of concern for Machine Learning models as they tend to push the result towards extreme values.
I was talking to one of my friends who happens to be an operations manager at one of the Supermarket chains in India. Over our discussion, we started talking about the amount of preparation the store chain needs to do before the Indian festive season (Diwali) kicks in. He told me how critical it is for them to estimate / predict which product will sell like hot cakes and which would not prior to the purchase. A bad decision can leave your customers to look for offers and products in the competitor stores. The challenge does not finish there – you need to estimate the sales of products across a range of different categories for stores in varied locations and with consumers having different consumption techniques. While my friend was describing the challenge, the data scientist in me started smiling! I just figured out a potential topic for my next article. In today's article, I will tell you everything you need to know about regression models and how they can be used to solve prediction problems like the one mentioned above. Take a moment to list down all those factors you can think, on which the sales of a store will be dependent on. For each factor create an hypothesis about why and how that factor would influence the sales of various products. For example – I expect the sales of products to depend on the location of the store, because the local residents in each area would have different lifestyle. The amount of bread a store will sell in Ahmedabad would be a fraction of similar store in Mumbai. Similarly list down all possible factors you can think of. Location of your shop, availability of the products, size of the shop, offers on the product, advertising done by a product, placement in the store could be some features on which your sales would depend on.
Linear regression is one of the simplest machine learning techniques you can use. It is often useful as a baseline relative to more powerful techniques. Like all regressions, we wish to map some input X to some input Y. You may recall from your high school studies that this is just the equation for a straight line. When X is 1-D, or when "Y has one explanatory variable", we call this "simple linear regression".