AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction

Neural Information Processing SystemsOct-11-2024, 12:43:16 GMT

Vision transformer networks have shown superiority in many computer vision tasks. In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection. Both the vision transformer network and the energy-based prior model are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. Further, with the generative vision transformer, we can easily obtain a pixel-wise uncertainty map from an image, which indicates the model confidence in predicting saliency from the image. Different from the existing generative models which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive to capture the latent space of the data.

energy-based latent space, learning generative vision transformer, saliency prediction, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.63)

Add feedback

Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing SystemsOct-11-2024, 12:17:54 GMT

Group fairness is measured via parity of quantitative metrics across different protected demographic groups. In this paper, we investigate the problem of reliably assessing group fairness metrics when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores (for unlabeled examples) of each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions for an array of group fairness metrics with a notion of uncertainty.

fairness metric, group fairness metric, unlabeled data and bayesian inference, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.65)

Add feedback

Automatic Variational Inference in Stan

Neural Information Processing SystemsOct-11-2024, 11:49:51 GMT

Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult for non-experts to use. We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI); we implement it in Stan (code available), a probabilistic programming system. In ADVI the user provides a Bayesian model and a dataset, nothing else. We make no conjugacy assumptions and support a broad class of models.

advi, automatic variational inference, stan, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

X-CAL: Explicit Calibration for Survival Analysis

Neural Information Processing SystemsOct-11-2024, 11:20:35 GMT

When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows us to directly optimize calibration and strike a desired trade-off between predictive power and calibration.

calibration, explicit calibration, survival analysis, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Oncology (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.44)

Add feedback

Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination

Neural Information Processing SystemsOct-11-2024, 10:58:35 GMT

Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate.Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence.Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size.Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics.

batch bayesian quadrature, fast bayesian inference, kernel recombination, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)

Add feedback

Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing SystemsOct-11-2024, 10:00:56 GMT

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas' states, the Nos\'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.

bayesian learning, dataset, e-hoover dynamic, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

Neural Information Processing SystemsOct-11-2024, 10:00:12 GMT

Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution inside the set with high-probability.

arbitrary treewidth, fast-mixing parameter set, maximum likelihood learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Add feedback

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Neural Information Processing SystemsOct-11-2024, 08:22:30 GMT

Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP.

backpropagating, discrete exponential family distribution, implicit mle, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.77)

Add feedback

Independence Testing for Bounded Degree Bayesian Networks

Neural Information Processing SystemsOct-11-2024, 07:28:57 GMT

We study the following independence testing problem: given access to samples from a distribution P over \{0,1\} n, decide whether P is a product distribution or whether it is \varepsilon -far in total variation distance from any product distribution. For arbitrary distributions, this problem requires \exp(n) samples. We show in this work that if P has a sparse structure, then in fact only linearly many samples are required.Specifically, if P is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by d, then \tilde{\Theta}(2 {d/2}\cdot n/\varepsilon 2) samples are necessary and sufficient for independence testing.

bounded degree bayesian network, independence testing, product distribution

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

The Broad Optimality of Profile Maximum Likelihood

Neural Information Processing SystemsOct-11-2024, 07:09:23 GMT

We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size k and desired accuracy \varepsilon: \textbf{Distribution estimation} Under \ell_1 distance, PML yields optimal \Theta(k/(\varepsilon 2\log k)) sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; \textbf{Additive property estimation} For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; \textbf{ \alpha -R\'enyi entropy estimation} For an integer \alpha 1, the PML plug-in estimator has optimal k {1-1/\alpha} sample complexity; for non-integer \alpha 3/4, the PML plug-in estimator has sample complexity lower than the state of the art; \textbf{Identity testing} In testing whether an unknown distribution is equal to or at least \varepsilon far from a given distribution in \ell_1 distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of k . With minor modifications, most of these results also hold for a near-linear-time computable variant of PML.

estimation, estimator, sample complexity, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.64)

Add feedback