Goto

Collaborating Authors

 Bayesian Inference


Fairness Measures for Regression via Probabilistic Classification

arXiv.org Machine Learning

Algorithmic fairness involves expressing notions such as equity, or reasonable treatment, as quantifiable measures that a machine learning algorithm can optimise. Most work in the literature to date has focused on classification problems where the prediction is categorical, such as accepting or rejecting a loan application. This is in part because classification fairness measures are easily computed by comparing the rates of outcomes, leading to behaviours such as ensuring that the same fraction of eligible men are selected as eligible women. But such measures are computationally difficult to generalise to the continuous regression setting for problems such as pricing, or allocating payments. The difficulty arises from estimating conditional densities (such as the probability density that a system will over-charge by a certain amount). For the regression setting we introduce tractable approximations of the independence, separation and sufficiency criteria by observing that they factorise as ratios of different conditional probabilities of the protected attributes. We introduce and train machine learning classifiers, distinct from the predictor, as a mechanism to estimate these probabilities from the data. This naturally leads to model agnostic, tractable approximations of the criteria, which we explore experimentally.


Quantified limits of the nuclear landscape

arXiv.org Machine Learning

The chart of the nuclides is limited by particle drip lines beyond which nuclear stability to proton or neutron emission is lost. Predicting the range of particle-bound isotopes poses an appreciable challenge for nuclear theory as it involves extreme extrapolations of nuclear masses beyond the regions where experimental information is available. Still, quantified extrapolations are crucial for a variety of applications, including the modeling of stellar nucleosynthesis. We use microscopic nuclear mass models and Bayesian methodology to provide quantified predictions of proton and neutron separation energies as well as Bayesian probabilities of existence throughout the nuclear landscape all the way to the particle drip lines. We apply nuclear density functional theory with several energy density functionals. To account for uncertainties, Bayesian Gaussian processes are trained on the separation-energy residuals for each individual model, and the resulting predictions are combined via Bayesian model averaging. This framework allows to account for systematic and statistical uncertainties and propagate them to extrapolative predictions. We characterize the drip-line regions where the probability that the nucleus is particle-bound decreases from $1$ to $0$. In these regions, we provide quantified predictions for one- and two-nucleon separation energies. According to our Bayesian model averaging analysis, 7759 nuclei with $Z\leq 119$ have a probability of existence $\geq 0.5$. The extrapolations obtained in this study will be put through stringent tests when new experimental information on exotic nuclei becomes available. In this respect, the quantified landscape of nuclear existence obtained in this study should be viewed as a dynamical prediction that will be fine-tuned when new experimental information and improved global mass models become available.


Masking schemes for universal marginalisers

arXiv.org Machine Learning

We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser (arXiv:1711.00695) in order to learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$, where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some arbitrary subset of all random variables of the generative model of interest. In other words, we mimic the self-supervised training of a denoising autoencoder, where a dataset of unlabelled data is used as partially observed input and the neural approximator is optimised to minimise reconstruction loss. We focus on studying the underlying process of the partially observed data---how good is the neural approximator at learning all conditional distributions when the observation process at prediction time differs from the masking process during training? We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties.


Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

arXiv.org Machine Learning

Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Structured relational models are a good candidate for this style of inference. In the single-site context, second order methods become feasible because the typical cubic costs associated with these methods is now restricted to the dimension of each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is to find a parameterized proposal density rather than the maxima. As a further improvement on existing first and second order methods, we show that random variables with constrained supports don't need to be transformed before taking a gradient step. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.


Automated extraction of mutual independence patterns using Bayesian comparison of partition models

arXiv.org Machine Learning

Mutual independence is a key concept in statistics that characterizes the structural relationships between variables. Existing methods to investigate mutual independence rely on the definition of two competing models, one being nested into the other and used to generate a null distribution for a statistic of interest, usually under the asymptotic assumption of large sample size. As such, these methods have a very restricted scope of application. In the present manuscript, we propose to change the investigation of mutual independence from a hypothesis-driven task that can only be applied in very specific cases to a blind and automated search within patterns of mutual independence. To this end, we treat the issue as one of model comparison that we solve in a Bayesian framework. We show the relationship between such an approach and existing methods in the case of multivariate normal distributions as well as cross-classified multinomial distributions. We propose a general Markov chain Monte Carlo (MCMC) algorithm to numerically approximate the posterior distribution on the space of all patterns of mutual independence. The relevance of the method is demonstrated on synthetic data as well as two real datasets, showing the unique insight provided by this approach.


Mathematics Behind AI & Machine Learning

#artificialintelligence

Let's face reality, mathematics is far from being enjoyable. To learn it, we often lack time, and most importantly, motivation. Why do we need all these symbols and a bunch of figures? It turns out, a lot of sense. Especially if you have something to do with machine learning. The point here is not to acquire knowledge, but to be able to use it.


Sparse Covariance Estimation in Logit Mixture Models

arXiv.org Machine Learning

This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models. Researchers typically specify covariance matrices in logit mixture models under one of two extreme assumptions: either an unrestricted full covariance matrix (allowing correlations between all random coefficients), or a restricted diagonal matrix (allowing no correlations at all). Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances. We propose a new estimator, called MISC, that uses a mixed-integer optimization (MIO) program to find an optimal block diagonal structure specification for the covariance matrix, corresponding to subsets of correlated coefficients, for any desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws from the unrestricted full covariance matrix. The optimal sparsity level of the covariance matrix is determined using out-of-sample validation. We demonstrate the ability of MISC to correctly recover the true covariance structure from synthetic data. In an empirical illustration using a stated preference survey on modes of transportation, we use MISC to obtain a sparse covariance matrix indicating how preferences for attributes are related to one another.


Analysis of Bayesian Inference Algorithms by the Dynamical Functional Approach

arXiv.org Machine Learning

We analyze the dynamics of an algorithm for approximate inference with large Gaussian latent variable models in a student-teacher scenario. To model nontrivial dependencies between the latent variables, we assume random covariance matrices drawn from rotation invariant ensembles. For the case of perfect data-model matching, the knowledge of static order parameters derived from the replica method allows us to obtain efficient algorithmic updates in terms of matrix-vector multiplications with a fixed matrix. Using the dynamical functional approach, we obtain an exact effective stochastic process in the thermodynamic limit for a single node. From this, we obtain closed-form expressions for the rate of the convergence. Analytical results are excellent agreement with simulations of single instances of large models.


Efficient Debiased Variational Bayes by Multilevel Monte Carlo Methods

arXiv.org Machine Learning

Variational Bayes is a method to find a good approximation of the posterior probability distribution of latent variables from a parametric family of distributions. The evidence lower bound (ELBO), which is nothing but the model evidence minus the Kullback-Leibler divergence, has been commonly used as a quality measure in the optimization process. However, the model evidence itself has been considered computationally intractable since it is expressed as a nested expectation with an outer expectation with respect to the training dataset and an inner conditional expectation with respect to latent variables. Similarly, if the Kullback-Leibler divergence is replaced with another divergence metric, the corresponding lower bound on the model evidence is often given by such a nested expectation. The standard (nested) Monte Carlo method can be used to estimate such quantities, whereas the resulting estimate is biased and the variance is often quite large. Recently the authors provided an unbiased estimator of the model evidence with small variance by applying the idea from multilevel Monte Carlo (MLMC) methods. In this article, we give more examples involving nested expectations in the context of variational Bayes where MLMC methods can help construct low-variance unbiased estimators, and provide numerical results which demonstrate the effectiveness of our proposed estimators.


Robust Gaussian Process Regression with a Bias Model

arXiv.org Machine Learning

This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.