Gradient-based optimization for variational empirical Bayes multiple regression

Banerjee, Saikat, Carbonetto, Peter, Stephens, Matthew

arXiv.org Machine Learning 

Multiple linear regression provides a simple, but widely used, method to find associations between outcomes (responses) and a set of predictors (explanatory variables). It has been actively studied over more than a century, and there is a rich and vast literature on the subject [1]. In practical situations the number of predictor variables is often large, and it becomes desirable to induce sparsity in the regression coefficients to avoid overfitting [2, 3]. Sparse linear regression also serves as the foundation for non-linear techniques, such as trendfiltering [4, 5], which can estimate an underlying non-linear trend from time series data. Applications of sparse multiple linear regression and trendfiltering arise in a wide range of applications in modern science and engineering, including astronomy [6], atmospheric sciences [7], biology [8], economics [9, 10], genetics [11-15], geophysics [16], medical sciences [17, 18], social sciences [19] and text analysis [20]. Approaches to sparse linear regression can be broadly classified into two groups: (a) penalized linear regressions (PLR), which add a penalty term to the likelihood to penalize the magnitude of its parameters [21-23], and (b) Bayesian approaches [11-14, 24-29], which use a prior probability distribution on the model parameters to induce sparsity.