Goto

Collaborating Authors

 Bayesian Inference


Reviews: Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

Neural Information Processing Systems

Update: I downgrade my review to 5. The main concern is 1) Some more extensive simulations will make the results more convincing, as the numerical experiment is the only way to assess the performance of the proposed priors. It might take a major revision to reflect such comprehensive comparisons. With that being said, I believe the paper does contain interesting results that are novel and useful to the community. In particular, the theoretical results seem sound, and the paper is fairly readable. But I think there is also room for improvement.


Reviews: Bayesian Adversarial Learning

Neural Information Processing Systems

This paper proposes a Bayesian model for adversarial learning problem. Empirical studies on Fashion-MINST and traffic sign recognition show that the proposed methods is slightly better than other adversarial learning baselines. Below I list my concerns about the paper: For modeling, 1. This paper ignore a highly relevant work'Bayesian GAN' [1]. The non-cooperative game between'data generator' and'learner' established in this paper is almost the same as the vanilla GAN.


Reviews: Robust Learning of Fixed-Structure Bayesian Networks

Neural Information Processing Systems

I preface this by saying that I have reviewed this paper once for NIPS 2016, and re-read it. It seems the paper has no essential changes, so my opinion is largely the same. The paper considers the problem of learning the parameters of a Bayes net with known structure, given samples from it with potentially adversarial noise. The main goal is to get bounds on the samples that are independent of dimension. The main requirements on the Bayes net parameters are reasonable: the probability of any configuration of the parents is reasonable and the conditional probabilities on any edge are bounded away from 0 and 1.


Reviews: Amortized Inference Regularization

Neural Information Processing Systems

This paper puts forward the idea that we should in certain cases regularize the generative model in VAEs in order to improve generalization properties. Since VAEs perform maximum likelihood estimation, they can in principle exhibit the same overfitting problems as any other maximum likelihood model. This paper argues that we can regularize the generative model by increasing the smoothness of the inference model. The authors consider the Denoising VAE (DVAE) as a means of achieving such regularization. In the special case where the encoder is an exponential family, they show that the optimum natural parameters for any input data can be expressed as a weighted average over the optimum parameters for the data in the training set.


Reviews: Bayesian Inference of Temporal Task Specifications from Demonstrations

Neural Information Processing Systems

The authors introduce a probabilistic model for inferring task specification as a linear temporal logic (LTL) formula. This is encoded as three different behaviors, represented by LTL templates. The authors present linear chains, sets of LC and Forest of sub-tasks as prior distributions, as well as Complexity based and complexity independent domain-agnostic likelihood function. Given a set of demonstrations, the authors perform inference to obtain a posterior distribution over candidate formulas, which represent task specifications. The authors show that their method is able to recover accurate task specifications from demonstrations in both simulated domains and on a real-world dinner table domain. The authors provide a good background on LTL.


Reviews: Differentially Private Bayesian Inference for Exponential Families

Neural Information Processing Systems

This paper proposes an approach for differentially private estimation of the posterior distribution in conjugate exponential-family models. Similar to previous "naive" approaches, it enforces privacy by adding Laplace-distributed noise to the sufficient statistic. Where a naive approach would treat this noisy statistic as true, the main contribution of this paper is a Gibbs sampling algorithm to integrate over uncertainty in the true statistic given the observed noisy statistic. This is the proper Bayesian procedure, and the experiments show that this yields better-calibrated posterior estimates than naive updating or one-posterior sampling (OPS). The paper is very clear, cleanly written and easy to follow; I found no obvious mistakes.


Score-Based Variational Inference for Inverse Problems

arXiv.org Artificial Intelligence

Existing diffusion-based methods for inverse problems sample from the posterior using score functions and accept the generated random samples as solutions. In applications that posterior mean is preferred, we have to generate multiple samples from the posterior which is time-consuming. In this work, by analyzing the probability density evolution of the conditional reverse diffusion process, we prove that the posterior mean can be achieved by tracking the mean of each reverse diffusion step. Based on that, we establish a framework termed reverse mean propagation (RMP) that targets the posterior mean directly. We show that RMP can be implemented by solving a variational inference problem, which can be further decomposed as minimizing a reverse KL divergence at each reverse step. We further develop an algorithm that optimizes the reverse KL divergence with natural gradient descent using score functions and propagates the mean at each reverse step. Experiments demonstrate the validity of the theory of our framework and show that our algorithm outperforms state-of-the-art algorithms on reconstruction performance with lower computational complexity in various inverse problems.


Leveraging free energy in pretraining model selection for improved fine-tuning

arXiv.org Artificial Intelligence

Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that lend themselves to good downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint's adaptability by measuring the concentration of nearby favorable parameters for the downstream task. We demonstrate that this free energy criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the free energy criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability. The advent of foundation models has significantly reshaped the landscape of modern machine learning (Bommasani et al., 2021).


Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.


Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models

arXiv.org Machine Learning

Many scientific fields collect longitudinal multivariate count data where the total number of counts is arbitrary (e.g., multinomial observations). These data are often called count compositional as the information in the data relates to the relative frequencies of the categories (Silverman et al., 2018). These data occur frequently in molecular biology (Espinoza et al., 2020), microbiome studies (Silverman et al., 2018; Joseph et al., 2020; Äijö et al., 2018), natural language processing (Linderman et al., 2015), biomedicine (Fokianos and Kedem, 2003), and social sciences (Cargnoni et al., 1997). Although the counting process used to collect these data is often modeled as multinomial, other sources of noise in the system being studied often lead to extra-multinomial variation. While some account for this extra-multinomial variability with multinomial-Dirichlet models (Mosimann, 1962), multinomial logistic-normal models are often superior, as they can account for both positive and negative covariation between multinomial categories (Aitchison and Shen, 1980; Cargnoni et al., 1997; Joseph et al., 2020; Silverman et al., 2018). Moreover, under suitable transformation (i.e., link function), the logistic-normal is multivariate Gaussian.