The Reciprocal Bayesian LASSO

arXiv.org Machine Learning

Throughout the course of the paper, we assume that y and X have been centered at 0 so there is no intercept in the model, where y is the n 1 vector of centered responses, X is the n p matrix of standardized regressors, β is the p 1 vector of coefficients to be estimated, and null is the n 1 vector of independent and identically distributed normal errors with mean 0 and variance σ 2 . Compared to traditional penalization functions that are usually symmetric about 0, continuous and nondecreasing in (0,), the rLASSO penalty functions are decreasing in (0,), discontinuous at 0, and converge to infinity when the coefficients approach zero. From a theoretical standpoint, rLASSO shares the same oracle property and same rate of estimation error with other LASSOtype penalty functions. An early reference to this class of models can be found in Song and Liang (2015), with more recent papers focusing on large sample asymptotics, along with computational strategies for frequentist estimation (Shin et al., 2018; Song, 2018). Our approach differs from this line of work in adopting a Bayesian perspective on rLASSO estimation. Ideally, a Bayesian solution can be obtained by placing appropriate priors on the regression coefficients that will mimic the effects of the rLASSO penalty. As apparent from (1), this arises in assuming a prior for β that decomposes as a product of independent inverse Laplace (double exponential) densities: π (β) p null j 1 λ 2β 2 j exp{ λ β j }I { β j null 0 }.


Bayesian $l_0$ Regularized Least Squares

arXiv.org Machine Learning

Bayesian $l_0$-regularized least squares provides a variable selection technique for high dimensional predictors. The challenge in $l_0$ regularization is optimizing a non-convex objective function via search over model space consisting of all possible predictor combinations, a NP-hard task. Spike-and-slab (a.k.a. Bernoulli-Gaussian, BG) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. We show that a Single Best Replacement (SBR) algorithm is a fast scalable alternative. Although SBR calculates a sparse posterior mode, we show that it possesses a number of equivalences and optimality properties of a posterior mean. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike-and-slab priors. Finally, we conclude with directions for future research.


Generalized double Pareto shrinkage

arXiv.org Machine Learning

We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's $t$-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.


Horseshoe Regularization for Machine Learning in Complex and Deep Models

arXiv.org Machine Learning

Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian methodology in machine learning, specifically for high-dimensional regression and classification problems. They have achieved remarkable success in computation, and enjoy strong theoretical support. Most of the existing literature has focused on the linear Gaussian case; see Bhadra et al. (2019) for a systematic survey. The purpose of the current article is to demonstrate that the horseshoe regularization is useful far more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.


Generalized Beta Mixtures of Gaussians

arXiv.org Machine Learning

In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We first propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efficiently to the types of truly massive data sets that are now encountered routinely.