Data Science of Variable Selection: A Review

#artificialintelligence 

Data scientists are always stressing over the "best" approach to variable selection, particularly when faced with massive amounts of information -- a frequent occurrence these days. "Massive" by today's standards means terabytes of data and tens, if not hundreds, of millions of features or predictors. There are many reasons for this "stress" but the reality is that a single, canonical solution does not exist. There are as many approaches to selecting features as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. For years, there have been rumors that Google uses all available features in building its predictive algorithms.