Goto

Collaborating Authors

 Bayesian Inference


Exact Recovery of Community Detection in k-Community Gaussian Mixture Model

arXiv.org Machine Learning

We study the community detection problem on a Gaussian mixture model, in which vertices are divided into $k\geq 2$ distinct communities. The major difference in our model is that the intensities for Gaussian perturbations are different for different entries in the observation matrix, and we do not assume that every community has the same number of vertices. We explicitly find the threshold for the exact recovery of the maximum likelihood estimation. Applications include the community detection on hypergraphs.


Hierarchical Marketing Mix Models with Sign Constraints

arXiv.org Machine Learning

Marketing mix models (MMMs) are statistical models for measuring the effectiveness of various marketing activities such as promotion, media advertisement, etc. In this research, we propose a comprehensive marketing mix model that captures the hierarchical structure and the carryover, shape and scale effects of certain marketing activities, as well as sign restrictions on certain coefficients that are consistent with common business sense. In contrast to commonly adopted approaches in practice, which estimate parameters in a multi-stage process, the proposed approach estimates all the unknown parameters/coefficients simultaneously using a constrained maximum likelihood approach and solved with the Hamiltonian Monte Carlo algorithm. We present results on real datasets to illustrate the use of the proposed solution algorithm.


BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals

arXiv.org Machine Learning

A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this paper, we propose Bayesian Latent Organic Bandit model (BLOB), a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. The bandit signal is valuable as it gives direct feedback of recommendation performance, but the signal quality is very uneven, as it is highly concentrated on the recommendations deemed optimal by the past version of the recom-mender system. In contrast, the organic signal is typically strong and covers most items, but is not always relevant to the recommendation task. In order to leverage the organic signal to e ciently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances. We implement a scalable approximation of the full model using variational auto-encoders and the local re-paramerization trick. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based) both in organic and bandit-rich environments.


Unifying supervised learning and VAEs -- automating statistical inference in high-energy physics

arXiv.org Machine Learning

A KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning, variational autoencoders (VAEs) and semi-supervised learning under one umbrella of variational inference. This viewpoint has several advantages. For VAEs, it clarifies the interpretation of encoder and decoder parts. For supervised learning, it re-iterates that the training procedure approximates the true posterior over labels and can always be viewed as approximate likelihood-free inference. This is typically not discussed, even though the derivation is well-known in the literature. In the context of semi-supervised learning it motivates an extended supervised scheme which allows to calculate a goodness-of-fit p-value using posterior predictive simulations. Flow-based networks with a standard normal base distribution are crucial. We discuss how they allow to rigorously define coverage for arbitrary joint posteriors on $\mathbb{R}^n \times \mathcal{S}^m$, which encompasses posteriors over directions. Finally, systematic uncertainties are naturally included in the variational viewpoint. With the three ingredients of (1) systematics, (2) coverage and (3) goodness-of-fit, flow-based neural networks have the potential to replace a large part of the statistical toolbox of the contemporary high-energy physicist.


Latent feature sharing: an adaptive approach to linear decomposition models

arXiv.org Machine Learning

Ubiquitous linear Gaussian exploratory tools such as principle component analysis (PCA) and factor analysis (FA) remain widely used as tools for: exploratory analysis, pre-processing, data visualization and related tasks. However, due to their rigid assumptions including crowding of high dimensional data, they have been replaced in many settings by more flexible and still interpretable latent feature models. The Feature allocation is usually modelled using discrete latent variables assumed to follow either parametric Beta-Bernoulli distribution or Bayesian nonparametric prior. In this work we propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques. The new framework allows for explicit control over the number of features used to express each point and enables a more flexible set of allocation distributions including feature allocations with different sparsity levels. This approach is used to derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction in a wide case of scenarios. We derive both standard Gibbs sampler, as well as, an expectation-maximization inference algorithms that converge orders of magnitude faster to a reasonable point estimate solution. The utility of the proposed aPPCA model is demonstrated for standard PCA tasks such as feature learning, data visualization and data whitening. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features. We also demonstrate an application of the aPPCA to more robust blind source separation for functional magnetic resonance imaging (fMRI).


Learning All Credible Bayesian Network Structures for Model Averaging

arXiv.org Artificial Intelligence

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known score-and-search approach. However, selecting a single model (i.e., the best scoring BN) can be misleading or may not achieve the best possible accuracy. An alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging, where the space of possible BNs is sampled or enumerated in some fashion. Unfortunately, existing approaches for model averaging either severely restrict the structure of the Bayesian network or have only been shown to scale to networks with fewer than 30 random variables. In this paper, we propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches.


Causal Feature Learning for Utility-Maximizing Agents

arXiv.org Artificial Intelligence

Discovering high-level causal relations from low-level data is an important and challenging problem that comes up frequently in the natural and social sciences. In a series of papers, Chalupka et al. (2015, 2016a, 2016b, 2017) develop a procedure for causal feature learning (CFL) in an effort to automate this task. We argue that CFL does not recommend coarsening in cases where pragmatic considerations rule in favor of it, and recommends coarsening in cases where pragmatic considerations rule against it. We propose a new technique, pragmatic causal feature learning (PCFL), which extends the original CFL algorithm in useful and intuitive ways. We show that PCFL has the same attractive measure-theoretic properties as the original CFL algorithm. We compare the performance of both methods through theoretical analysis and experiments.


Meta-Learning with Shared Amortized Variational Inference

arXiv.org Machine Learning

We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables. We learn the prior distribution over model parameters conditioned on limited training data using a variational autoencoder approach. Our framework proposes sharing the same amortized inference network between the conditional prior and variational posterior distributions over the model parameters. While the posterior leverages both the labeled support and query data, the conditional prior is based only on the labeled support data. We show that in earlier work, relying on Monte-Carlo approximation, the conditional prior collapses to a Dirac delta function. In contrast, our variational approach prevents this collapse and preserves uncertainty over the model parameters. We evaluate our approach on the miniImageNet, CIFAR-FS and FC100 datasets, and present results demonstrating its advantages over previous work.


The linear conditional expectation in Hilbert space

arXiv.org Machine Learning

The linear conditional expectation (LCE) provides a best linear (or rather, affine) estimate of the conditional expectation and hence plays an important r\^ole in approximate Bayesian inference, especially the Bayes linear approach. This article establishes the analytical properties of the LCE in an infinite-dimensional Hilbert space context. In addition, working in the space of affine Hilbert--Schmidt operators, we establish a regularisation procedure for this LCE. As an important application, we obtain a simple alternative derivation and intuitive justification of the conditional mean embedding formula, a concept widely used in machine learning to perform the conditioning of random variables by embedding them into reproducing kernel Hilbert spaces.


Variable selection for Gaussian process regression through a sparse projection

arXiv.org Machine Learning

This paper presents a new variable selection approach integrated with Gaussian process (GP) regression. We consider a sparse projection of input variables and a general stationary covariance model that depends on the Euclidean distance between the projected features. The sparse projection matrix is considered as an unknown parameter. We propose a forward stagewise approach with embedded gradient descent steps to co-optimize the parameter with other covariance parameters based on the maximization of a non-convex marginal likelihood function with a concave sparsity penalty, and some convergence properties of the algorithm are provided. The proposed model covers a broader class of stationary covariance functions than the existing automatic relevance determination approaches, and the solution approach is more computationally feasible than the existing MCMC sampling procedures for the automatic relevance parameter estimation with a sparsity prior. The approach is evaluated for a large number of simulated scenarios. The choice of tuning parameters and the accuracy of the parameter estimation are evaluated with the simulation study. In the comparison to some chosen benchmark approaches, the proposed approach has provided a better accuracy in the variable selection. It is applied to an important problem of identifying environmental factors that affect an atmospheric corrosion of metal alloys.