ML From Scratch, Part 1: Linear Regression - OranLooney.com
To kick off this series, will start with something simple yet foundational: linear regression via ordinary least squares. While not exciting, linear regression finds widespread use both as a standalone learning algorithm and as a building block in more advanced learning algorithms. The output layer of a deep neural network trained for regression with MSE loss, simple AR time series models, and the "local regression" part of LOWESS smoothing are all examples of linear regression being used as an ingredient in a more sophisticated model. Linear regression is also the "simple harmonic oscillator" of machine learning; that is to say, a pedagogical example that allows us to present deep theoretical ideas about machine learning in a context that is not too mathematically taxing. There is also the small matter of it being the most widely used supervised learning algorithm in the world; although how much weight that carries I suppose depends on where you are on the "applied" to "theoretical" spectrum. However, since I can already feel your eyes glazing over from such an introductory topic, we can spice things up a little bit by doing something which isn't often done in introductory machine learning - we can present the algorithm that [your favorite statistical software here] actually uses to fit linear regression models: QR decomposition. It seems this is commonly glossed over because it involves more linear algebra than can be generally assumed, or perhaps because the exact solution we will derive doesn't generalize well to other machine learning algorithms, not even closely related variants such as regularized regression or robust regression.
Oct-19-2019, 18:09:41 GMT