Bayesian Inference
Bayesian inference of infected patients in group testing with prevalence estimation
Group testing is a method of identifying infected patients by performing tests on a pool of specimens collected from patients. For the case in which the test returns a false result with finite probability, we propose Bayesian inference and a corresponding belief propagation (BP) algorithm to identify the infected patients from the results of tests performed on the pool. We show that the true-positive rate is improved by taking into account the credible interval of a point estimate of each patient. Further, the prevalence and the error probability in the test are estimated by combining an expectation-maximization method with the BP algorithm. As another approach, we introduce a hierarchical Bayes model to identify the infected patients and estimate the prevalence. By comparing these methods, we formulate a guide for practical usage.
Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization
Si, Shijing, Oates, Chris. J., Duncan, Andrew B., Carin, Lawrence, Briol, François-Xavier
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Our results are both empirical, based on a range of test functions and problems in Bayesian inference, and theoretical, based on an analysis of the variance reduction that can be achieved.
Uncertainty quantification using martingales for misspecified Gaussian processes
Neiswanger, Willie, Ramdas, Aaditya
We address uncertainty quantification for Gaussian processes (GPs) under misspecified priors, with an eye towards Bayesian Optimization (BO). GPs are widely used in BO because they easily enable exploration based on posterior uncertainty bands. However, this convenience comes at the cost of robustness: a typical function encountered in practice is unlikely to have been drawn from the data scientist's prior, in which case uncertainty estimates can be misleading, and the resulting exploration can be suboptimal. This brittle behavior is convincingly demonstrated in simple simulations. We present a frequentist approach to GP/BO uncertainty quantification. We utilize the GP framework as a working model, but do not assume correctness of the prior. We instead construct a confidence sequence (CS) for the unknown function using martingale techniques. There is a necessary cost to achieving robustness: if the prior was correct, posterior GP bands are narrower than our CS. Nevertheless, when the prior is wrong, our CS is statistically valid and empirically outperforms standard GP methods, in terms of both coverage and utility for BO. Additionally, we demonstrate that powered likelihoods provide robustness against model misspecification.
Conditional Sampling With Monotone GANs
Kovachki, Nikola, Baptista, Ricardo, Hosseini, Bamdad, Marzouk, Youssef
We present a new approach for sampling conditional measures that enables uncertainty quantification in supervised learning tasks. We construct a mapping that transforms a reference measure to the probability measure of the output conditioned on new inputs. The mapping is trained via a modification of generative adversarial networks (GANs), called monotone GANs, that imposes monotonicity constraints and a block triangular structure. We present theoretical results, in an idealized setting, that support our proposed method as well as numerical experiments demonstrating the ability of our method to sample the correct conditional measures in applications ranging from inverse problems to image in-painting.
A New Perspective on Learning Context-Specific Independence
Shen, Yujia, Choi, Arthur, Darwiche, Adnan
Local structure such as context-specific independence (CSI) has received much attention in the probabilistic graphical model (PGM) literature, as it facilitates the modeling of large complex systems, as well as for reasoning with them. In this paper, we provide a new perspective on how to learn CSIs from data. We propose to first learn a functional and parameterized representation of a conditional probability table (CPT), such as a neural network. Next, we quantize this continuous function, into an arithmetic circuit representation that facilitates efficient inference. In the first step, we can leverage the many powerful tools that have been developed in the machine learning literature. In the second step, we exploit more recently-developed analytic tools from explainable AI, for the purposes of learning CSIs. Finally, we contrast our approach, empirically and conceptually, with more traditional variable-splitting approaches, that search for CSIs more explicitly.
Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems
Zhou, Chang, Ma, Jianxin, Zhang, Jianwei, Zhou, Jingren, Yang, Hongxia
Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning is essential to large-scale recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender systems face severe unfairness of exposure with a vocabulary several orders of magnitude larger than that of natural language, implying that (1) MLE will preserve and even exacerbate the exposure bias in the long run in order to faithfully fit the observed samples, and (2) suboptimal sampling and inadequate use of item features can lead to inferior representations for the unfairly ignored items. In this paper, we introduce CLRec, a Contrastive Learning paradigm that has been successfully deployed in a real-world massive recommender system, to alleviate exposure bias in DCG. We theoretically prove that a popular choice of contrastive loss is equivalently reducing the exposure bias via inverse propensity scoring, which provides a new perspective on the effectiveness of contrastive learning. We further employ a fixed-size queue to store the items' representations computed in previously processed batches, and use the queue to serve as an effective sampler of negative examples. This queue-based design provides great efficiency in incorporating rich features of the thousand negative items per batch thanks to computation reuse. Extensive offline analyses and four-month online A/B tests in Mobile Taobao demonstrate substantial improvement, including a dramatic reduction in the Matthew effect.
Adversarial Likelihood-Free Inference on Black-Box Generator
Kim, Dongjun, Joo, Weonyoung, Shin, Seungjae, Song, Kyungwoo, Moon, Il-Chul
Generative Adversarial Network (GAN) can be viewed as an implicit estimator of a data distribution, and this perspective motivates using the adversarial concept in the true input parameter estimation of black-box generators. While previous works on likelihood-free inference introduces an implicit proposal distribution on the generator input, this paper analyzes theoretic limitations of the proposal distribution approach. On top of that, we introduce a new algorithm, Adversarial Likelihood-Free Inference (ALFI), to mitigate the analyzed limitations, so ALFI is able to find the posterior distribution on the input parameter for black-box generative models. We experimented ALFI with diverse simulation models as well as pre-trained statistical models, and we identified that ALFI achieves the best parameter estimation accuracy with a limited simulation budget.
Bayesian Experience Reuse for Learning from Multiple Demonstrators
Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn
Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors, whose corresponding posteriors are, respectively, learned from demonstrations and target data using Bayesian neural networks with shared features. We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models. Finally, we propose Bayesian Experience Reuse (BERS) to sample demonstrations in accordance with this distribution and reuse them directly in new tasks. We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.
A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play
Bansal, Shray, Xu, Jin, Howard, Ayanna, Isbell, Charles
We consider shared workspace scenarios with humans and robots acting to achieve independent goals, termed as parallel play. We model these as general-sum games and construct a framework that utilizes the Nash equilibrium solution concept to consider the interactive effect of both agents while planning. We find multiple Pareto-optimal equilibria in these tasks. We hypothesize that people act by choosing an equilibrium based on social norms and their personalities. To enable coordination, we infer the equilibrium online using a probabilistic model that includes these two factors and use it to select the robot's action. We apply our approach to a close-proximity pick-and-place task involving a robot and a simulated human with three potential behaviors - defensive, selfish, and norm-following. We showed that using a Bayesian approach to infer the equilibrium enables the robot to complete the task with less than half the number of collisions while also reducing the task execution time as compared to the best baseline. We also performed a study with human participants interacting either with other humans or with different robot agents and observed that our proposed approach performs similar to human-human parallel play interactions. The code is available at https://github.com/shray/bayes-nash
Optimal Continual Learning has Perfect Memory and is NP-hard
Knoblauch, Jeremias, Husain, Hisham, Diethe, Tom
Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-hard problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches.