Bayesian Inference
Understanding the applications of Probability in Machine Learning
Probability is a measure of uncertainty. Probability applies to machine learning because in the real world, we need to make decisions with incomplete information. Hence, we need a mechanism to quantify uncertainty โ which Probability provides us. Using probability, we can model elements of uncertainty such as risk in financial transactions and many other business processes. In contrast, in traditional programming, we work with deterministic problems i.e. the solution is not affected by uncertainty.
Continual Multi-task Gaussian Processes
Moreno-Muรฑoz, Pablo, Artรฉs-Rodrรญguez, Antonio, รlvarez, Mauricio A.
We address the problem of continual learning in multi-task Gaussian process (GP) models for handling sequential input-output observations. Our approach extends the existing prior-posterior recursion of online Bayesian inference, i.e.\ past posterior discoveries become future prior beliefs, to the infinite functional space setting of GP. For a reason of scalability, we introduce variational inference together with an sparse approximation based on inducing inputs. As a consequence, we obtain tractable continual lower-bounds where two novel Kullback-Leibler (KL) divergences intervene in a natural way. The key technical property of our method is the recursive reconstruction of conditional GP priors conditioned on the variational parameters learned so far. To achieve this goal, we introduce a novel factorization of past variational distributions, where the predictive GP equation propagates the posterior uncertainty forward. We then demonstrate that it is possible to derive GP models over many types of sequential observations, either discrete or continuous and amenable to stochastic optimization. The continual inference approach is also applicable to scenarios where potential multi-channel or heterogeneous observations might appear. Extensive experiments demonstrate that the method is fully scalable, shows a reliable performance and is robust to uncertainty error propagation over a plenty of synthetic and real-world datasets.
Quantifying (Hyper) Parameter Leakage in Machine Learning
Black Box Machine Learning models leak information about the proprietary model parameters and architecture, both through side channels and output predictions. An adversary can thus, exploit this leakage to reconstruct a substitute architecture similar to the target model, violating the model privacy and Intellectual Property. However, all such attacks, infer a subset of the target model attributes and identifying the rest of the architecture and parameters (optimally) is a search problem. Extracting the exact target model is not possible owing to the uncertainty in the inference attack outputs and stochastic nature of the training process. In this work, we propose a probabilistic framework, Airavata, to estimate the leakage in such model extraction attacks. Specifically, we use Bayesian Networks to capture the uncertainty, under the subjective notion of probability, in estimating the target model attributes using various model extraction attacks. We experimentally validate the model under different adversary assumptions commonly adopted by various model extraction attacks to reason about the attack efficacy. Further, this provides a practical approach of inferring actionable knowledge about extracting black box models and identify the best combination of attacks which maximise the knowledge extracted (information leaked) from the target model.
Dynamic Regularizer with an Informative Prior
Regularization methods, specifically those which directly alter weights like $L_1$ and $L_2$, are an integral part of many learning algorithms. Both the regularizers mentioned above are formulated by assuming certain priors in the parameter space and these assumptions, in some cases, induce sparsity in the parameter space. Regularizers help in transferring beliefs one has on the dataset or the parameter space by introducing adequate terms in the loss function. Any kind of formulation represents a specific set of beliefs: $L_1$ regularization conveys that the parameter space should be sparse whereas $L_2$ regularization conveys that the parameter space should be bounded and continuous. These regularizers in turn leverage certain priors to express these inherent beliefs. A better understanding of how the prior affects the behavior of the parameters and how the priors can be updated based on the dataset can contribute greatly in improving the generalization capabilities of a function estimator. In this work, we introduce a weakly informative prior and then further extend it to an informative prior in order to formulate a regularization penalty, which shows better results in terms of inducing sparsity experimentally, when compared to regularizers based only on Gaussian and Laplacian priors. Experimentally, we verify that a regularizer based on an adapted prior improves the generalization capabilities of any network. We illustrate the performance of the proposed method on the MNIST and CIFAR-10 datasets.
Parameter elimination in particle Gibbs sampling
Wigren, Anna, Risuleo, Riccardo Sven, Murray, Lawrence, Lindsten, Fredrik
Bayesian inference in state-space models is challenging due to high-dimensional state trajectories. A viable approach is particle Markov chain Monte Carlo, combining MCMC and sequential Monte Carlo to form "exact approximations" to otherwise intractable MCMC methods. The performance of the approximation is limited to that of the exact method. We focus on particle Gibbs and particle Gibbs with ancestor sampling, improving their performance beyond that of the underlying Gibbs sampler (which they approximate) by marginalizing out one or more parameters. This is possible when the parameter prior is conjugate to the complete data likelihood. Marginalization yields a non-Markovian model for inference, but we show that, in contrast to the general case, this method still scales linearly in time. While marginalization can be cumbersome to implement, recent advances in probabilistic programming have enabled its automation. We demonstrate how the marginalized methods are viable as efficient inference backends in probabilistic programming, and demonstrate with examples in ecology and epidemiology.
Bayesian causal inference via probabilistic program synthesis
Witty, Sam, Lew, Alexander, Jensen, David, Mansinghka, Vikash
Causal inference can be formalized as Bayesian inference that combines a prior distribution over causal models and likelihoods that account for both observations and interventions. We show that it is possible to implement this approach using a sufficiently expressive probabilistic programming language. Priors are represented using probabilistic programs that generate source code in a domain specific language. Interventions are represented using probabilistic programs that edit this source code to modify the original generative process. This approach makes it straightforward to incorporate data from atomic interventions, as well as shift interventions, variance-scaling interventions, and other interventions that modify causal structure. This approach also enables the use of general-purpose inference machinery for probabilistic programs to infer probable causal structures and parameters from data. This abstract describes a prototype of this approach in the Gen probabilistic programming language.
Neural Density Estimation and Likelihood-free Inference
I consider two problems in machine learning and statistics: the problem of estimating the joint probability density of a collection of random variables, known as density estimation, and the problem of inferring model parameters when their likelihood is intractable, known as likelihood-free inference. The contribution of the thesis is a set of new methods for addressing these problems that are based on recent advances in neural networks and deep learning.
Learning from both experts and data
Besson, Rรฉmi, Pennec, Erwan Le, Allassonniรจre, Stรฉphanie
In this work we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an initial domain knowledge a priori before proceeding to an online data acquisition. We are particularly interested in the intermediate regime where we do not have enough data to do without the initial expert a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.
Sampling of Bayesian posteriors with a non-Gaussian probabilistic learning on manifolds from a small dataset
Soize, Christian, Ghanem, Roger
This paper tackles the challenge presented by small-data to the task of Bayesian inference. A novel methodology, based on manifold learning and manifold sampling, is proposed for solving this computational statistics problem under the following assumptions: 1) neither the prior model nor the likelihood function are Gaussian and neither can be approximated by a Gaussian measure; 2) the number of functional input (system parameters) and functional output (quantity of interest) can be large; 3) the number of available realizations of the prior model is small, leading to the small-data challenge typically associated with expensive numerical simulations; the number of experimental realizations is also small; 4) the number of the posterior realizations required for decision is much larger than the available initial dataset. The method and its mathematical aspects are detailed. Three applications are presented for validation: The first two involve mathematical constructions aimed to develop intuition around the method and to explore its performance. The third example aims to demonstrate the operational value of the method using a more complex application related to the statistical inverse identification of the non-Gaussian matrix-valued random elasticity field of a damaged biological tissue (osteoporosis in a cortical bone) using ultrasonic waves.
Scalable Inference for Nonparametric Hawkes Process Using P\'{o}lya-Gamma Augmentation
Zhou, Feng, Li, Zhidong, Fan, Xuhui, Wang, Yang, Sowmya, Arcot, Chen, Fang
In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, P\'{o}lya-Gamma random variables and latent marked Poisson processes), the likelihood is converted to two decoupled components with a Gaussian form which allows for an efficient conjugate analytical inference. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate. Furthermore, we extend the EM algorithm to an efficient approximate Bayesian inference algorithm: mean-field variational inference. We demonstrate the performance of two algorithms on simulated fictitious data. Experiments on real data show that our proposed inference algorithms can recover well the underlying prompting characteristics efficiently.