Horseshoe Regularization for Feature Subset Selection

arXiv.org Machine Learning

Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.


Learning From What You Don't Observe

arXiv.org Artificial Intelligence

The process of diagnosis involves learning about the state of a system from various observations of symptoms or findings about the system. Sophisticated Bayesian (and other) algorithms have been developed to revise and maintain beliefs about the system as observations are made. Nonetheless, diagnostic models have tended to ignore some common sense reasoning exploited by human diagnosticians; In particular, one can learn from which observations have not been made, in the spirit of conversational implicature. There are two concepts that we describe to extract information from the observations not made. First, some symptoms, if present, are more likely to be reported before others. Second, most human diagnosticians and expert systems are economical in their data-gathering, searching first where they are more likely to find symptoms present. Thus, there is a desirable bias toward reporting symptoms that are present. We develop a simple model for these concepts that can significantly improve diagnostic inference.


An SIR Graph Growth Model for the Epidemics of Communicable Diseases

arXiv.org Machine Learning

It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease. The major novelty of the SIR model we promote lies in the fact that the social network on which the epidemics is taking place is not specified in advance but evolves through time, accounting for the temporal evolution of the interactions involving infective individuals. Without assuming the existence of a fixed underlying network model, the stochastic process introduced describes, in a flexible and realistic manner, epidemic spread in non-uniformly mixing and possibly heterogeneous populations. It is shown how to fit such a (parametrised) model by means of Approximate Bayesian Computation methods based on graph-valued statistics. The concepts and statistical methods described in this paper are finally applied to a real epidemic dataset, related to the spread of HIV in Cuba in presence of a contact tracing system, which permits one to reconstruct partly the evolution of the graph of sexual partners diagnosed HIV positive between 1986 and 2006.


A Bayesian Nonparametric Method for Clustering Imputation, and Forecasting in Multivariate Time Series

arXiv.org Machine Learning

This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel "temporally-coupled" extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging prediction and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating competitive imputation performance and improved forecasting accuracy as compared to several state-of-the art baselines. We also show that the model discovers interpretable clusters in datasets with hundreds of time series using macroeconomic data from the Gapminder Foundation.


Defining Explanation in Probabilistic Systems

arXiv.org Artificial Intelligence

As probabilistic systems gain popularity and are coming into wider use, the need for a mechanism that explains the system's findings and recommendations becomes more critical. The system will also need a mechanism for ordering competing explanations. We examine two representative approaches to explanation in the literature - one due to G\"ardenfors and one due to Pearl - and show that both suffer from significant problems. We propose an approach to defining a notion of "better explanation" that combines some of the features of both together with more recent work by Pearl and others on causality.