AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Enhancing Hallucination Detection through Noise Injection

Liu, Litian, Pourreza, Reza, Panchal, Sunny, Bhattacharyya, Apratim, Qin, Yao, Memisevic, Roland

arXiv.org Artificial IntelligenceFeb-8-2025

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from a set of samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is sub-optimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate its effectiveness across a wide range of datasets and model architectures.

hallucination detection, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.03799

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

dynoGP: Deep Gaussian Processes for dynamic system identification

Benavoli, Alessio, Piga, Dario, Forgione, Marco, Zaffalon, Marco

arXiv.org Machine LearningFeb-8-2025

In this work, we present a novel approach to system identification for dynamical systems, based on a specific class of Deep Gaussian Processes (Deep GPs). These models are constructed by interconnecting linear dynamic GPs (equivalent to stochastic linear time-invariant dynamical systems) and static GPs (to model static nonlinearities). Our approach combines the strengths of data-driven methods, such as those based on neural network architectures, with the ability to output a probability distribution. This offers a more comprehensive framework for system identification that includes uncertainty quantification. Using both simulated and real-world data, we demonstrate the effectiveness of the proposed approach.

artificial intelligence, identification, machine learning, (21 more...)

arXiv.org Machine Learning

2502.0562

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Oceania > Australia > Victoria (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-7-2025, 21:45:20 GMT

SUMMARY: This paper studies the effect of noise correlation in some models of multi-output regression. It argues that a method that does not benefit from the correlation, such as Ordinary Least Squares (OLS), may perform much worse than a method that does, such as Maximum Likelihood Estimation (MLE). For certain linear models (Pooled model and Seemingly Unrelated Regression), which are studied in the paper, the MLE estimator requires the joint optimization of the covariance and regression weights. This is a non-convex problem. Alternative Minimization (AltMin) algorithm is an approach to solve the problem by iteratively optimizing the covariance and the weights.

author feedback and meta-review, correlation, export review, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)

Add feedback

Review for NeurIPS paper: Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay

Neural Information Processing SystemsFeb-7-2025, 20:55:55 GMT

Additional Feedback: It would be interesting to see a discussion of how this work lies in comparison to classes of knowledge bases that enable tractable abductive reasoning [1]. For example, is this result a special case of some known class/language? I just wanted to address the author's request for specific references "that might cast doubt on the novelty of our work". Sorry for not being more concrete, but here are some specific references. David Eppstein The polynomial time enumeration algorithm proposed for Eq 16 is basically subset sum where we enumerate all subsets that sum less than some threshold.

linear classifier, neurips paper, polynomial time and delay, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Review for NeurIPS paper: A Limitation of the PAC-Bayes Framework

Neural Information Processing SystemsFeb-7-2025, 20:29:45 GMT

Weaknesses: The paper is technically heavy for my expertise, so I can only raise questions about its content. Might they be naive, discussing them in the paper would help other readers to understand the scope of this work. A first concern is about the fact that the paper presents solely (Theorem 1) the PAC-Bayes bound of McAllester (1999), converging at rate sqrt(1/m). Since this pioneer work, many variations on the PAC-Bayes bounds have been proposed. Notably, Seeger (2002)'s and Catoni (2007)'s bounds are known to converge at rate 1/m when the empirical risk is zero (see also Guedj (2019) for a up-to-date overview of PAC-Bayes literature).

classifier, neurips paper, pac-bayes framework, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Review for NeurIPS paper: Instance Based Approximations to Profile Maximum Likelihood

Neural Information Processing SystemsFeb-7-2025, 17:51:31 GMT

Summary and Contributions: Statistical property estimation is an important and active area at the intersection of theoretical computer science, statistics, and information theory. For example, a basic question in this realm: given n iid samples from an unknown discrete distribution p, how well can we estimate the entropy H(p), and what is an efficient algorithm for doing so? Recent efforts have shown that, for any symmetric property, the profile maximum likelihood estimator is universally minimax optimal for a wide range of parameters. While this at first seemed like a purely theoretical result, algorithmic efforts quickly caught up to show that 1) efficient approximation of the profile maximum likelihood estimator is possible and 2) approximate profile maximum likelihood estimation suffices for minimax optimality. In this context, this paper refines recent approximation algorithms from exp(-\sqrt{n} log n) to exp(-k log n) where k is the number of observed frequencies, with k O(\sqrt{n}).

approximation, neurips paper, profile maximum likelihood, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Review for NeurIPS paper: Instance Based Approximations to Profile Maximum Likelihood

Neural Information Processing SystemsFeb-7-2025, 17:51:23 GMT

This paper proposes new and substantial improvements to the algorithmic side of the PLM estimation problem. New theoretical tools are introduced and the analysis is refined and deep. The authors seem to have adequately addressed all of the concerns in the rebuttal.

approximation, neurips paper, profile maximum likelihood

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-7-2025, 11:29:12 GMT

This paper is about a new Bayesian method for multi label learning. The goal is to classify accurately in settings where there are many potential labels but only a few of them apply to each data point. The basis of the new results is a new generative model for the label vector of each example. Specifically the label vector y_n of the n-th example is generated as y_n f(V(\sigma(Wx_n)), where Wx_n is a lower dimensional projection of the n-th instance x_n, followed by an element-wise sigmoid activation \sigma. The final operation f corresponds to drawing Poisson random variables with rates given by V(\sigma(Wx_n)) and thresholding these so-called latent counts by taking the minimum with 1.

author feedback and meta-review, discussion, export review, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.39)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.39)

Add feedback

Probabilistic Artificial Intelligence

Krause, Andreas, Hübotter, Jonas

arXiv.org Artificial IntelligenceFeb-7-2025

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.

bayesian inference, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2502.05244

Country:

Europe (0.67)
North America > United States (0.67)

Genre:

Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Energy > Oil & Gas > Upstream (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(6 more...)

Add feedback

In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

Smart, Matthew, Bietti, Alberto, Sengupta, Anirvan M.

arXiv.org Artificial IntelligenceFeb-7-2025

We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.

artificial intelligence, machine learning, transformer, (15 more...)

arXiv.org Artificial Intelligence

2502.05164

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
(2 more...)

Add feedback