The method of regularization is very popular in the field of machine learning however you will see that many people are still not using it. One reason I can think of is because of the complexity behind the whole concept of the regularization so I thought to make it simple for all of us. In this article I am going to try to explain the regularization in a way that it is easy to understand and easy to use. Basically while I explain the concept I will give practical details t on how to implement regularization in R and SAS. In very simple terms Regularization refers to the method of preventing overfitting, by explicitly controlling the model complexity.
When we talk about Regression, we often end up discussing Linear and Logistics Regression. Do you know there are 7 types of Regressions? Linear and logistic regression is just the most loved members from the family of regressions. Last week, I saw a recorded talk at NYC Data Science Academy from Owen Zhang, current Kaggle rank 3 and Chief Product Officer at DataRobot. He said, 'if you are using regression without regularization, you have to be very special!'. I hope you get what a person of his stature referred to. I understood it very well and decided to explore regularization techniques in detail.
Following are the various steps we will walk together and try gaining an understanding. In this post, we will consider Linear Regression as the algorithm where the target variable'y' will be explained by 2 features'x1' and'x2' whose coefficients are β1 and β2. First up, lets get some minor prerequisites out of the way in order to understand their use down the line. Optional: Refer Chapter 3 in the link below to gain understanding about Linear Regression. In Fig 1(a) below, Gradient Descent is represented in 3-dim.