I was talking to one of my friends who happens to be an operations manager at one of the Supermarket chains in India. Over our discussion, we started talking about the amount of preparation the store chain needs to do before the Indian festive season (Diwali) kicks in. He told me how critical it is for them to estimate / predict which product will sell like hot cakes and which would not prior to the purchase. A bad decision can leave your customers to look for offers and products in the competitor stores. The challenge does not finish there – you need to estimate the sales of products across a range of different categories for stores in varied locations and with consumers having different consumption techniques. While my friend was describing the challenge, the data scientist in me started smiling! I just figured out a potential topic for my next article. In today's article, I will tell you everything you need to know about regression models and how they can be used to solve prediction problems like the one mentioned above. Take a moment to list down all those factors you can think, on which the sales of a store will be dependent on. For each factor create an hypothesis about why and how that factor would influence the sales of various products. For example – I expect the sales of products to depend on the location of the store, because the local residents in each area would have different lifestyle. The amount of bread a store will sell in Ahmedabad would be a fraction of similar store in Mumbai. Similarly list down all possible factors you can think of. Location of your shop, availability of the products, size of the shop, offers on the product, advertising done by a product, placement in the store could be some features on which your sales would depend on.

Linear regression is one of the simplest machine learning techniques you can use. It is often useful as a baseline relative to more powerful techniques. Like all regressions, we wish to map some input X to some input Y. You may recall from your high school studies that this is just the equation for a straight line. When X is 1-D, or when "Y has one explanatory variable", we call this "simple linear regression".

Date Science, or Machine Learning, is a scary topic. It's hard to know where to get started. It's hard to even find a good definition of what it does and what you have to do. As I've given a few ad hoc presentations on Machine Learning (and though focused on implementing it with Azure, the basics are applicable to other platforms) I thought I'd take my random notes and present them as a primer. You don't need to be a Rocket Scientist to get started, but having a basic understanding of Linear Algebra will be helpful.