Variable Selection in the Presence of Massive Data
Data scientists are always stressing over the "best" approach to variable selection, particularly when faced with massive amounts of information, a not uncommon occurrence these days. "Massive" by today's standards means terabytes of data and tens, if not hundreds, of millions of features or predictors. There are many reasons for this but the reality is that there a single, canonical answer does not exist. There are as many approaches as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. For years, there have been rumors that Google uses all available features in building its predictive algorithms.
Jun-4-2016, 09:30:34 GMT