Bayesian Inference
Transforming variables to central normality
Raymaekers, Jakob, Rousseeuw, Peter J.
Many real data sets contain features (variables) whose distribution is far from normal (gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box-Cox and Yeo-Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose an automatic preprocessing technique that is robust against such outliers, which transforms the data to central normality. It compares favorably to existing techniques in an extensive simulation study and on real data.
Conformal Prediction: a Unified Review of Theory and New Challenges
Zeni, Gianluca, Fontana, Matteo, Vantini, Simone
In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.
MCMC-Based Learning of Finite Bivariate Beta Mixture Models
Rasti, Maryam (Concordia University ) | Manouchehri, Narges (Concordia University) | Bouguila, Nizar (Concordia University)
In this paper, we present a Bayesian approach for finite mixture models based on three-parameter bivariate Beta distributions. The estimation of the parameters is based on the Monte Carlo simulation technique of Gibbs sampling mixed with a Metropolis-Hastings step. The performance of our Bayesian algorithm is verified by several synthetic datasets and in the end, the feasibility of the proposed method is demonstrated by experimenting on some real datasets in which, the results are compared with those obtained by implementing the same approach using Gaussian mixture model.
Constaint-Based Learning for Non-Parametric Continuous Bayesian Networks
Lasserre, Marvin (Laboratoire d'Informatique de Paris 6 ) | Lebrun, Régis (Airbus AI Research) | Wuillemin, Pierre-Henri (Laboratoire d'Informatique de Paris 6)
Modeling high-dimensional multivariate distributions is a computationally challenging task. Bayesian networks have been successfully used to reduce the complexity and simplify the problem with discrete variables. However, it lacks of a general model for continuous variables. In order to overcome this problem, Elidan (2010) proposed the model of copula bayesian networks (CBN) that reparametrizes bayesian networks with conditional copula functions. We propose a new learning algorithm for CBN based on a PC algorithm and a conditional independence test proposed by Bouezmarni, Rombouts, Taamouti (2009). This test being non-parametric, no model assumptions are made allowing it to be as general as possible. This algorithm is compared on generated data with the score based method proposed by Elidan (2010)}. Not only it proves to be faster, but also it generalizes well on data generated from distributions far from the gaussian model.
Learning NAT-Modeled Bayesian Networks from Data
Xiang, Yang (University of Guelph ) | Wang, Qian (University of Guelph)
Bayesian networks (BNs) encode conditional independence to avoid combinatorial explosion on the number of variables, but are subject to exponential growth of space and inference time on the number of causes per effect variable. Among space-efficient local models, we focus on the Non-Impeding Noisy-AND Tree (NIN-AND Tree or NAT) models due to their multiple merits, and on NAT-modeled BNs where each multi-parent variable family may be encoded as a NAT-model. Although BN inference is generally exponential on treewidth, inference is tractable with NAT-modeled BNs of high treewidth and low density. In this work, we present the first study to learn NAT-modeled BNs from data. We apply the MDL principle to learning NAT-modeled BNs by developing a corresponding scoring function, and we couple it with heuristic structure search. We show that when data satisfy NAT causal independence, and high treewidth, low density structure, learning underlying NAT modeled BNs is feasible.
Improving the EDCM Mixture Model with Expectation Propagation
Sumba, Xavier (Concordia University ) | Zamzami, Nuha (Concordia University and King Abdulaziz University) | Bouguila, Nizar (Concordia University)
Bayesian inference is crucial to challenging scenarios that involve complex probabilistic models, which are usually intractable. In this work, we develop an expectation propagation approach to learn finite mixture models of EDCMs. The EDCM (Elkan 2006) is an exponential-family approximation to the widely used Dirichlet Compound Multinomial distribution and has been shown to offer excellent modeling capabilities in the case of sparse count data. Expectation propagation is a deterministic approach that provides accurate approximations to the full posterior and allows to include prior beliefs in the model as opposed to the maximum-likelihood method, which provides point estimates only. We evaluate the efficiency of our framework on several datasets for sentiment analysis and shape recognition. Our proposed model shows comparable to superior results to other approaches in the literature.
Stopping criterion for active learning based on deterministic generalization bounds
Ishibashi, Hideaki, Hino, Hideitsu
Active learning is a framework in which the learning machine can select the samples to be used for training. This technique is promising, particularly when the cost of data acquisition and labeling is high. In active learning, determining the timing at which learning should be stopped is a critical issue. In this study, we propose a criterion for automatically stopping active learning. The proposed stopping criterion is based on the difference in the expected generalization errors and hypothesis testing. We derive a novel upper bound for the difference in expected generalization errors before and after obtaining a new training datum based on PAC-Bayesian theory. Unlike ordinary PAC-Bayesian bounds, though, the proposed bound is deterministic; hence, there is no uncontrollable trade-off between the confidence and tightness of the inequality. We combine the upper bound with a statistical test to derive a stopping criterion for active learning. We demonstrate the effectiveness of the proposed method via experiments with both artificial and real datasets.
Variational Inference as Iterative Projection in a Bayesian Hilbert Space
Barfoot, Timothy D., D'Eleuterio, Gabriele M. T.
Variational Bayesian inference is an important machine-learning tool that finds application from statistics to robotics. The goal is to find an approximate probability density function (PDF) from a chosen family that is in some sense `closest' to the full Bayesian posterior. Closeness is typically defined through the selection of an appropriate loss functional such as the Kullback-Leibler (KL) divergence. In this paper, we explore a new formulation of variational inference by exploiting the fact that the set of PDFs constitutes a Bayesian Hilbert space under careful definitions of vector addition, scalar multiplication and an inner product. We show that variational inference based on KL divergence then amounts to an iterative projection of the Bayesian posterior onto a subspace corresponding to the selected approximation family. In fact, the inner product chosen for the Bayesian Hilbert space suggests the definition of a new measure of the information contained in a PDF and in turn a new divergence is introduced. Each step in the iterative projection is equivalent to a local minimization of this divergence. We present an example Bayesian subspace based on exponentiated Hermite polynomials as well as work through the details of this general framework for the specific case of the multivariate Gaussian approximation family and show the equivalence to another Gaussian variational inference approach. We furthermore discuss the implications for systems that exhibit sparsity, which is handled naturally in Bayesian space.
Crackovid: Optimizing Group Testing
Abraham, Louis, Bécigneul, Gary, Schölkopf, Bernhard
We study the problem usually referred to as group testing in the context of COVID-19. Given $n$ samples taken from patients, how should we select mixtures of samples to be tested, so as to maximize information and minimize the number of tests? We consider both adaptive and non-adaptive strategies, and take a Bayesian approach with a prior both for infection of patients and test errors. We start by proposing a mathematically principled objective, grounded in information theory. We then optimize non-adaptive optimization strategies using genetic algorithms, and leverage the mathematical framework of adaptive sub-modularity to obtain theoretical guarantees for the greedy-adaptive method.
Upper Bounds on the Generalization Error of Private Algorithms
Rodríguez-Gálvez, Borja, Bassi, Germán, Skoglund, Mikael
In this work, we study the generalization capability of algorithms from an information-theoretic perspective. It has been shown that the generalization error of an algorithm is bounded from above in terms of the mutual information between the algorithm's output hypothesis and the dataset with which it was trained. We build upon this fact and introduce a mathematical formulation to obtain upper bounds on this mutual information. We then develop a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of smooth algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets. In particular, we show the bounds obtained with this strategy for the case of ɛ-DP and µ-GDP algorithms. A learning algorithm is a mechanism that takes a collection of data samples as an input and outputs a hypothesis. The usage of this type of algorithm spans from estimating the sinusoidal parameters of a received, noisy signal [1] to detecting and localizing a tumor from an MRI scan [2]. The generalization capability of a learning algorithm indicates its ability to perform similarly in new, unseen data, as it performed in the finite amount of data with which it was trained. Therefore, characterizing this capability allows us to evaluate the worth of an algorithm outside of the training data and, with a proper characterization framework, design robust algorithms.