AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning

Li, Zhaoting, Pérez-Dattari, Rodrigo, Babuska, Robert, Della Santina, Cosimo, Kober, Jens

arXiv.org Artificial IntelligenceFeb-11-2025

Behavior cloning (BC) traditionally relies on demonstration data, assuming the demonstrated actions are optimal. This can lead to overfitting under noisy data, particularly when expressive models are used (e.g., the energy-based model in Implicit BC). To address this, we extend behavior cloning into an iterative process of optimal action estimation within the Interactive Imitation Learning framework. Specifically, we introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to estimate a set of desired actions and optimizes the policy to select actions from this set. We provide theoretical guarantees for the convergence of the desired action set to optimal actions in both single and multiple optimal action cases. Extensive simulation and real-robot experiments validate CLIC's advantages over existing state-of-the-art methods, including stable training of energy-based models, robustness to feedback noise, and adaptability to diverse feedback types beyond demonstrations. Our code will be publicly available soon.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.07645

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Variational Learning Induces Adaptive Label Smoothing

Yang, Sin-Han, Liu, Zhedong, Marconi, Gian Maria, Khan, Mohammad Emtiyaz

arXiv.org Artificial IntelligenceFeb-11-2025

We show that variational learning naturally induces an adaptive label smoothing where label noise is specialized for each example. Such label-smoothing is useful to handle examples with labeling errors and distribution shifts, but designing a good adaptivity strategy is not always easy. We propose to skip this step and simply use the natural adaptivity induced during the optimization of a variational objective. We show empirical results where a variational algorithm called IVON outperforms traditional label smoothing and yields adaptivity strategies similar to those of an existing approach. By connecting Bayesian methods to label smoothing, our work provides a new way to handle overconfident predictions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.07273

Country:

Europe > France (0.04)
Europe > Austria (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Add feedback

Density Ratio Estimation with Conditional Probability Paths

Yu, Hanlin, Klami, Arto, Hyvärinen, Aapo, Korba, Anna, Chehab, Omar

arXiv.org Artificial IntelligenceFeb-11-2025

Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on samples from the two densities. However, existing methods for this problem remain computationally expensive and can yield inaccurate estimates. Inspired by recent advances in generative modeling, we introduce a novel framework for time score estimation, based on a conditioning variable. Choosing the conditioning variable judiciously enables a closed-form objective function. We demonstrate that, compared to previous approaches, our approach results in faster learning of the time score and competitive or better estimation accuracies of the density ratio on challenging tasks. Furthermore, we establish theoretical guarantees on the error of the estimated density ratio.

artificial intelligence, machine learning, time score, (17 more...)

arXiv.org Artificial Intelligence

2502.023

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Add feedback

Optimality in importance sampling: a gentle survey

Llorente, Fernando, Martino, Luca

arXiv.org Machine LearningFeb-11-2025

Monte Carlo (MC) methods are powerful tools for numerical inference and optimization widely employed in statistics, signal processing and machine learning Liu (2004); Robert and Casella (2004). They are mainly used for computing approximately the solution of definite integrals, and by extension, of differential equations (for this reason, MC schemes can be considered stochastic quadrature rules). Although exact analytical solutions to integrals are always desirable, such unicorns are rarely available, specially in real-world systems. Many applications inevitably require the approximation of intractable integrals. Specifically, Bayesian methods need the computation of expectations with respect to posterior probability density function (pdf) which, generally, are analytically intractable Gelman et al. (2013). The MC methods can be divided in four main families: direct methods (based on transformations or random variables), accept-reject techniques, Markov chain Monte Carlo (MCMC) algorithms, and importance sampling (IS) schemes Luengo et al. (2020); Martino et al. (2018). The last two families are the most popular for the facility and universality of their possible application Liang et al. (2010); Liu (2004); Robert and Casella (2004). All the MC methods require the choice of a suitable proposal density that is crucial for their performance Luengo et al. (2020); Robert and Casella (2004).

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

2502.07396

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > United Kingdom > England (0.04)

Genre:

Research Report (0.40)
Instructional Material (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis)

Christgau, Alexander Mangulad

arXiv.org Machine LearningFeb-11-2025

This thesis contains a series of independent contributions to statistics, unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. Mathematical insights are obtained from concrete examples, and these insights are generalized to principles that permeate the rest of the thesis. The second chapter studies the concept of local independence, which describes whether the evolution of one stochastic process is directly influenced by another. To test local independence, we define a model-free parameter called the Local Covariance Measure (LCM). We formulate an estimator for the LCM, from which a test of local independence is proposed. We discuss how the size and power of the proposed test can be controlled uniformly and investigate the test in a simulation study. The third chapter focuses on covariate adjustment, a method used to estimate the effect of a treatment by accounting for observed confounding. We formulate a general framework that facilitates adjustment for any subset of covariate information. We identify the optimal covariate information for adjustment and, based on this, introduce the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of treatment effects. An instance of DOPE is implemented using neural networks, and we demonstrate its performance on simulated and real data. The fourth and final chapter introduces a model-free measure of the conditional association between an exposure and a time-to-event, which we call the Aalen Covariance Measure (ACM). We develop a model-free estimation method and show that it is doubly robust, ensuring $\sqrt{n}$-consistency provided that the nuisance functions can be estimated with modest rates. A simulation study demonstrates the use of our estimator in several settings.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2502.07906

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Energy (0.67)
Education (0.65)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Singular leaning coefficients and efficiency in learning theory

Aoyagi, Miki

arXiv.org Machine LearningFeb-11-2025

Singular learning models with non-positive Fisher information matrices include neural networks, reduced-rank regression, Boltzmann machines, normal mixture models, and others. These models have been widely used in the development of learning machines. However, theoretical analysis is still in its early stages. In this paper, we examine learning coefficients, which indicate the general learning efficiency of deep linear learning models and three-layer neural network models with ReLU units. Finally, we extend the results to include the case of the Softmax function.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2501.12747

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
(2 more...)

Add feedback

Generative Modeling with Bayesian Sample Inference

Lienen, Marten, Kollovieh, Marcel, Günnemann, Stephan

arXiv.org Machine LearningFeb-11-2025

We derive a novel generative model from the simple act of Gaussian posterior inference. Treating the generated sample as an unknown variable to infer lets us formulate the sampling process in the language of Bayesian probability. Our model uses a sequence of prediction and posterior update steps to narrow down the unknown sample from a broad initial belief. In addition to a rigorous theoretical analysis, we establish a connection between our model and diffusion models and show that it includes Bayesian Flow Networks (BFNs) as a special case. In our experiments, we demonstrate improved performance over both BFNs and Variational Diffusion Models, achieving competitive likelihood scores on CIFAR10 and ImageNet.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

2502.0758

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Energy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Adaptive kernel predictors from feature-learning infinite limits of neural networks

Lauditi, Clarissa, Bordelon, Blake, Pehlevan, Cengiz

arXiv.org Machine LearningFeb-11-2025

Previous influential work showed that infinite width limits of neural networks in the lazy training regime are described by kernel machines. Here, we show that neural networks trained in the rich, feature learning infinite-width regime in two different settings are also described by kernel machines, but with data-dependent kernels. For both cases, we provide explicit expressions for the kernel predictors and prescriptions to numerically calculate them. To derive the first predictor, we study the large-width limit of feature-learning Bayesian networks, showing how feature learning leads to task-relevant adaptation of layer kernels and preactivation densities. The saddle point equations governing this limit result in a min-max optimization problem that defines the kernel predictor. To derive the second predictor, we study gradient flow training of randomly initialized networks trained with weight decay in the infinite-width limit using dynamical mean field theory (DMFT). The fixed point equations of the arising DMFT defines the task-adapted internal representations and the kernel predictor. We compare our kernel predictors to kernels derived from lazy regime and demonstrate that our adaptive kernels achieve lower test loss on benchmark datasets.

artificial intelligence, machine learning, predictor, (16 more...)

arXiv.org Machine Learning

2502.07998

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

Mishra, Aayush, Habermann, Daniel, Schmitt, Marvin, Radev, Stefan T., Bürkner, Paul-Christian

arXiv.org Machine LearningFeb-11-2025

Neural amortized Bayesian inference (ABI) can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, neural ABI is not yet sufficiently robust for widespread and safe applicability. In particular, when performing inference on observations outside of the scope of the simulated data seen during training, for example, because of model misspecification, the posterior approximations are likely to become highly biased. Due to the bad pre-asymptotic behavior of current neural posterior estimators in the out-of-simulation regime, the resulting estimation biases cannot be fixed in acceptable time by just simulating more training data. In this proof-of-concept paper, we propose a semi-supervised approach that enables training not only on (labeled) simulated data generated from the model, but also on unlabeled data originating from any source, including real-world data. To achieve the latter, we exploit Bayesian self-consistency properties that can be transformed into strictly proper losses without requiring knowledge of true parameter values, that is, without requiring data labels. The results of our initial experiments show remarkable improvements in the robustness of ABI on out-of-simulation data. Even if the observed data is far away from both labeled and unlabeled training data, inference remains highly accurate. If our findings also generalize to other scenarios and model classes, we believe that our new method represents a major breakthrough in neural ABI.

artificial intelligence, machine learning, self-consistency loss, (16 more...)

arXiv.org Machine Learning

2501.13483

Country:

North America > United States (0.14)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph

Schubert, Mátyás, Claassen, Tom, Magliacane, Sara

arXiv.org Machine LearningFeb-11-2025

Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for all variables, but only a small subgraph that includes the targets and their adjustment sets. In this paper, we focus on identifying causal effects between target variables in a computationally and statistically efficient way. This task combines causal discovery and effect estimation, aligning the discovery objective with the effects to be estimated. We show that definite non-ancestors of the targets are unnecessary to learn causal relations between the targets and to identify efficient adjustments sets. We sequentially identify and prune these definite non-ancestors with our Sequential Non-Ancestor Pruning (SNAP) framework, which can be used either as a preprocessing step to standard causal discovery methods, or as a standalone sound and complete causal discovery algorithm. Our results on synthetic and real data show that both approaches substantially reduce the number of independence tests and the computation time without compromising the quality of causal effect estimations.

artificial intelligence, machine learning, node number, (12 more...)

arXiv.org Machine Learning

2502.07857

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > California (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback