From both sides now: the math of linear regression ·
Linear regression is the most basic and the most widely used technique in machine learning; yet for all its simplicity, studying it can unlock some of the most important concepts in statistics. If you have a basic undestanding of linear regression expressed as \hat{Y} \theta_0 \theta_1X, but don't have a background in statistics and find statements like "ridge regression is equivalent to the maximum a posteriori (MAP) estimate with a zero-mean Gaussian prior" bewildering, then this post is for you. With a superficial goal of understanding that somewhat obtuse statement, its main objective is to explore the topic, starting from the standard formulation of linear regression, moving on to the probabilistic approach (maximum likelihood formulation) and from there to Bayesian linear regression. I'll use the \theta character throughout to refer to the coefficients (weights) of a regression model, either explicitly broken out as \theta_0 and \theta_1 for intercept and slope respectively, or just \theta referring to the vector of coefficients. I'll usually use the expression \theta Tx_i for the prediction a model gives at x_i, the assumption being that a 1 has been added to the vector of values at x_i . 1 In the single predictor case, we know that the least squares fit is the line that minimizes the sum of the squared distances between observed data and predicted values, i.e. it minimizes the Residual Sum of Squares (RSS): These residuals are pretty important in how we reason about our model.
Oct-16-2016, 15:01:54 GMT