Bayesian Inference
Variational Bayes for high-dimensional linear regression with sparse priors
We study a mean-field spike and slab variational Bayes (VB) approximation to Bayesian model selection priors in sparse high-dimensional linear regression. Under compatibility conditions on the design matrix, oracle inequalities are derived for the mean-field VB approximation, implying that it converges to the sparse truth at the optimal rate and gives optimal prediction of the response vector. The empirical performance of our algorithm is studied, showing that it works comparably well as other state-of-the-art Bayesian variable selection methods. We also numerically demonstrate that the widely used coordinate-ascent variational inference (CAVI) algorithm can be highly sensitive to the parameter updating order, leading to potentially poor performance. To mitigate this, we propose a novel prioritized updating scheme that uses a data-driven updating order and performs better in simulations.
URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks
Vadera, Meet P., Cobb, Adam D., Jalaian, Brian, Marlin, Benjamin M.
While deep learning methods continue to improve This paper describes initial work on URSABench, an open in predictive accuracy on a wide range source suite of benchmarking tools for assessment of approximate of application domains, significant issues remain Bayesian inference methods applied to deep with other aspects of their performance including neural network classification tasks. URSABench includes their ability to quantify uncertainty and their benchmark models, data sets, tasks and evaluation metrics robustness. Recent advances in approximate focused on simultaneously assessing the uncertainty Bayesian inference hold significant promise for quantification performance, robustness, computational scalability addressing these concerns, but the computational and accuracy of learning and inference methods.
Robust Bayesian Classification Using an Optimistic Score Ratio
Nguyen, Viet Anh, Si, Nian, Blanchet, Jose
We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set which is prescribed using a limited structural constraint on the mean vector and the covariance matrix of the underlying contextual distribution. We show that the Bayesian classifier using the optimistic score ratio is conceptually attractive, delivers solid statistical guarantees and is computationally tractable. We showcase the power of the proposed optimistic score ratio classifier on both synthetic and empirical data.
Detection of Gravitational Waves Using Bayesian Neural Networks
Lin, Yu-Chiung, Wu, Jiun-Huei Proty
We propose a new model of Bayesian Neural Networks to not only detect the events of compact binary coalescence in the observational data of gravitational waves (GW) but also identify the time periods of the associated GW waveforms before the events. This is achieved by incorporating the Bayesian approach into the CLDNN classifier, which integrates together the Convolutional Neural Network (CNN) and the Long Short-Term Memory Recurrent Neural Network (LSTM). Our model successfully detect all seven BBH events in the LIGO Livingston O2 data, with the periods of their GW waveforms correctly labeled. The ability of a Bayesian approach for uncertainty estimation enables a newly defined `awareness' state for recognizing the possible presence of signals of unknown types, which is otherwise rejected in a non-Bayesian model. Such data chunks labeled with the awareness state can then be further investigated rather than overlooked. Performance tests show that our model recognizes 90% of the events when the optimal signal-to-noise ratio $\rho_\text{opt} >7$ (100% when $\rho_\text{opt} >8.5$) and successfully labels more than 95% of the waveform periods when $\rho_\text{opt} >8$. The latency between the arrival of peak signal and generating an alert with the associated waveform period labeled is only about 20 seconds for an unoptimized code on a moderate GPU-equipped personal computer. This makes our model possible for nearly real-time detection and for forecasting the coalescence events when assisted with deeper training on a larger dataset using the state-of-art HPCs.
Accelerated Sparse Bayesian Learning via Screening Test and Its Applications
In high-dimensional settings, sparse structures are critical for efficiency in term of memory and computation complexity. For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard, and thus alternative approximate methods should be considered. In this paper, our choice for alternative method is sparse Bayesian learning, which, as empirical Bayesian approaches, uses a parameterized prior to encourage sparsity in solution, rather than the other methods with fixed priors such as LASSO. Screening test, however, aims at quickly identifying a subset of features whose coefficients are guaranteed to be zero in the optimal solution, and then can be safely removed from the complete dictionary to obtain a smaller, more easily solved problem. Next, we solve the smaller problem, after which the solution of the original problem can be recovered by padding the smaller solution with zeros. The performance of the proposed method will be examined on various data sets and applications.
Covariate Distribution Aware Meta-learning
Setlur, Amrith, Dingliwal, Saket, Poczos, Barnabas
Meta-learning has proven to be successful at few-shot learning across the regression, classification and reinforcement learning paradigms. Recent approaches have adopted Bayesian interpretations to improve gradient based meta-learners by quantifying the uncertainty of the post-adaptation estimates. Most of these works almost completely ignore the latent relationship between the covariate distribution (p(x)) of a task and the corresponding conditional distribution p(y|x). In this paper, we identify the need to explicitly model the meta-distribution over the task covariates in a hierarchical Bayesian framework. We begin by introducing a graphical model that explicitly leverages very few samples drawn from p(x) to better infer the posterior over the optimal parameters of the conditional distribution (p(y|x)) for each task. Based on this model we provide an inference strategy and a corresponding meta-algorithm that explicitly accounts for the meta-distribution over task covariates. Finally, we demonstrate the significant gains of our proposed algorithm on a synthetic regression dataset.
Constraint-Based Learning for Continuous-Time Bayesian Networks
Bregoli, Alessandro, Scutari, Marco, Stella, Fabio
Dynamic Bayesian networks have been well explored in the literature as discrete-time models; however, their continuous-time extensions have seen comparatively little attention. In this paper, we propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks. We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence. Finally, we validate its performance using synthetic data, and discuss its strengths and limitations. We find that score-based is more accurate in learning networks with binary variables, while our constraint-based approach is more accurate with variables assuming more than two values. However, more experiments are needed for confirmation.
Solving Bayesian Network Structure Learning Problem with Integer Linear Programming
This dissertation investigates integer linear programming (ILP) formulation of Bayesian Network structure learning problem. We review the definition and key properties of Bayesian network and explain score metrics used to measure how well certain Bayesian network structure fits the dataset. We outline the integer linear programming formulation based on the decomposability of score metrics. In order to ensure acyclicity of the structure, we add ``cluster constraints'' developed specifically for Bayesian network, in addition to cycle constraints applicable to directed acyclic graphs in general. Since there would be exponential number of these constraints if we specify them fully, we explain the methods to add them as cutting planes without declaring them all in the initial model. Also, we develop a heuristic algorithm that finds a feasible solution based on the idea of sink node on directed acyclic graphs. We implemented the ILP formulation and cutting planes as a \textsf{Python} package, and present the results of experiments with different settings on reference datasets.
Semi-nonparametric Latent Class Choice Model with a Flexible Class Membership Component: A Mixture Model Approach
Sfeir, Georges, Abou-Zeid, Maya, Rodrigues, Filipe, Pereira, Francisco Camara, Kaysi, Isam
This study presents a semi-nonparametric Latent Class Choice Model (LCCM) with a flexible class membership component. The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification with the aim of comparing the two approaches on various measures including prediction accuracy and representation of heterogeneity in the choice process. Mixture models are parametric model-based clustering techniques that have been widely used in areas such as machine learning, data mining and patter recognition for clustering and classification problems. An Expectation-Maximization (EM) algorithm is derived for the estimation of the proposed model. Using two different case studies on travel mode choice behavior, the proposed model is compared to traditional discrete choice models on the basis of parameter estimates' signs, value of time, statistical goodness-of-fit measures, and cross-validation tests. Results show that mixture models improve the overall performance of latent class choice models by providing better out-of-sample prediction accuracy in addition to better representations of heterogeneity without weakening the behavioral and economic interpretability of the choice models.
Meta-Learning for Variational Inference
Zhang, Ruqi, Li, Yingzhen, De Sa, Christopher, Devlin, Sam, Zhang, Cheng
Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this paper we propose a meta-learning algorithm to learn the divergence metric suited for the task of interest, automating the design of VI methods. In addition, we learn the initialization of the variational parameters without additional cost when our method is deployed in the few-shot learning scenarios. We demonstrate our approach outperforms standard VI on Gaussian mixture distribution approximation, Bayesian neural network regression, image generation with variational autoencoders and recommender systems with a partial variational autoencoder.