Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the number of nonparametric components as well as in the variables within each nonparametric component poses a stiff computational challenge. In this article, we develop a novel Bayesian additive regression model using a combination of hard and soft shrinkages to separately control the number of additive components and the variables within each component. An efficient algorithm is developed to select the importance variables and estimate the interaction network. Excellent performance is obtained in simulated and real data examples.
We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett--Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well known result due to West (1987), and is the better choice for collinear design matrices. The second representation is new, and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of betas, and provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normals are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the MCMC for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favorable contrast with analogous MCMC algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results are provided in two supplemental files.
The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of "L1", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.
In this work, we address the problem of solving a series of underdetermined linear inverse problems subject to a sparsity constraint. We generalize the spike-and-slab prior distribution to encode a priori correlation of the support of the solution in both space and time by imposing a transformed Gaussian process on the spike-and-slab probabilities. An expectation propagation (EP) algorithm for posterior inference under the proposed model is derived. For large scale problems, the standard EP algorithm can be prohibitively slow. We therefore introduce three different approximation schemes to reduce the computational complexity. Finally, we demonstrate the proposed model using numerical experiments based on both synthetic and real data sets.
We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods.