Goto

Collaborating Authors

 Bayesian Learning


Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

arXiv.org Artificial Intelligence

In this work, we evaluate the potential of Large Language Models (LLMs) in building Bayesian Networks (BNs) by approximating domain expert priors. LLMs have demonstrated potential as factual knowledge bases; however, their capability to generate probabilistic knowledge about real-world events remains understudied. We explore utilizing the probabilistic knowledge inherent in LLMs to derive probability estimates for statements regarding events and their relationships within a BN. Using LLMs in this context allows for the parameterization of BNs, enabling probabilistic modeling within specific domains. Our experiments on eighty publicly available Bayesian Networks, from healthcare to finance, demonstrate that querying LLMs about the conditional probabilities of events provides meaningful results when compared to baselines, including random and uniform distributions, as well as approaches based on next-token generation probabilities. We explore how these LLM-derived distributions can serve as expert priors to refine distributions extracted from data, especially when data is scarce. Overall, this work introduces a promising strategy for automatically constructing Bayesian Networks by combining probabilistic knowledge extracted from LLMs with real-world data. Additionally, we establish the first comprehensive baseline for assessing LLM performance in extracting probabilistic knowledge.


Conformal Set-based Human-AI Complementarity with Multiple Experts

arXiv.org Artificial Intelligence

Decision support systems are designed to assist human experts in classification tasks by providing conformal prediction sets derived from a pre-trained model. This human-AI collaboration has demonstrated enhanced classification performance compared to using either the model or the expert independently. In this study, we focus on the selection of instance-specific experts from a pool of multiple human experts, contrasting it with existing research that typically focuses on single-expert scenarios. We characterize the conditions under which multiple experts can benefit from the conformal sets. With the insight that only certain experts may be relevant for each instance, we explore the problem of subset selection and introduce a greedy algorithm that utilizes conformal sets to identify the subset of expert predictions that will be used in classifying an instance. This approach is shown to yield better performance compared to naive methods for human subset selection. Based on real expert predictions from the CIFAR-10H and ImageNet-16H datasets, our simulation study indicates that our proposed greedy algorithm achieves near-optimal subsets, resulting in improved classification performance among multiple experts.



Stochastic Trace Optimization of Parameter Dependent Matrices Based on Statistical Learning Theory

arXiv.org Machine Learning

We consider matrices $\boldsymbol{A}(\boldsymbolθ)\in\mathbb{R}^{m\times m}$ that depend, possibly nonlinearly, on a parameter $\boldsymbolθ$ from a compact parameter space $Θ$. We present a Monte Carlo estimator for minimizing $\text{trace}(\boldsymbol{A}(\boldsymbolθ))$ over all $\boldsymbolθ\inΘ$, and determine the sampling amount so that the backward error of the estimator is bounded with high probability. We derive two types of bounds, based on epsilon nets and on generic chaining. Both types predict a small sampling amount for matrices $\boldsymbol{A}(\boldsymbolθ)$ with small offdiagonal mass, and parameter spaces $Θ$ of small ``size.'' Dependence on the matrix dimension~$m$ is only weak or not explicit. The bounds based on epsilon nets are easier to evaluate and come with fully specified constants. In contrast, the bounds based on chaining depend on the Talagrand functionals which are difficult to evaluate, except in very special cases. Comparisons between the two types of bounds are difficult, although the literature suggests that chaining bounds can be superior.


Identifiability of the minimum-trace directed acyclic graph and hill climbing algorithms without strict local optima under weakly increasing error variances

arXiv.org Machine Learning

We prove that the true underlying directed acyclic graph (DAG) in Gaussian linear structural equation models is identifiable as the minimum-trace DAG when the error variances are weakly increasing with respect to the true causal ordering. This result bridges two existing frameworks as it extends the identifiable cases within the minimum-trace DAG method and provides a principled interpretation of the algorithmic ordering search approach, revealing that its objective is actually to minimize the total residual sum of squares. On the computational side, we prove that the hill climbing algorithm with a random-to-random (R2R) neighborhood does not admit any strict local optima. Under standard settings, we confirm the result through extensive simulations, observing only a few weak local optima. Interestingly, algorithms using other neighborhoods of equal size exhibit suboptimal behavior, having strict local optima and a substantial number of weak local optima.


Uncertainty-aware Accurate Elevation Modeling for Off-road Navigation via Neural Processes

arXiv.org Artificial Intelligence

Terrain elevation modeling for off-road navigation aims to accurately estimate changes in terrain geometry in real-time and quantify the corresponding uncertainties. Having precise estimations and uncertainties plays a crucial role in planning and control algorithms to explore safe and reliable maneuver strategies. However, existing approaches, such as Gaussian Processes (GPs) and neural network-based methods, often fail to meet these needs. They are either unable to perform in real-time due to high computational demands, underestimating sharp geometry changes, or harming elevation accuracy when learned with uncertainties. Recently, Neural Processes (NPs) have emerged as a promising approach that integrates the Bayesian uncertainty estimation of GPs with the efficiency and flexibility of neural networks. Inspired by NPs, we propose an effective NP-based method that precisely estimates sharp elevation changes and quantifies the corresponding predictive uncertainty without losing elevation accuracy. Our method leverages semantic features from LiDAR and camera sensors to improve interpolation and extrapolation accuracy in unobserved regions. Also, we introduce a local ball-query attention mechanism to effectively reduce the computational complexity of global attention by 17\% while preserving crucial local and spatial information. We evaluate our method on off-road datasets having interesting geometric features, collected from trails, deserts, and hills. Our results demonstrate superior performance over baselines and showcase the potential of neural processes for effective and expressive terrain modeling in complex off-road environments.



RCUKF: Data-Driven Modeling Meets Bayesian Estimation

arXiv.org Machine Learning

Accurate modeling is crucial in many engineering and scientific applications, yet obtaining a reliable process model for complex systems is often challenging. To address this challenge, we propose a novel framework, reservoir computing with unscented Kalman filtering (RCUKF), which integrates data-driven modeling via reservoir computing (RC) with Bayesian estimation through the unscented Kalman filter (UKF). The RC component learns the nonlinear system dynamics directly from data, serving as a surrogate process model in the UKF prediction step to generate state estimates in high-dimensional or chaotic regimes where nominal mathematical models may fail. Meanwhile, the UKF measurement update integrates real-time sensor data to correct potential drift in the data-driven model. We demonstrate RCUKF effectiveness on well-known benchmark problems and a real-time vehicle trajectory estimation task in a high-fidelity simulation environment.


RDDPM: Robust Denoising Diffusion Probabilistic Model for Unsupervised Anomaly Segmentation

arXiv.org Machine Learning

Recent advancements in diffusion models have demonstrated significant success in unsupervised anomaly segmentation. For anomaly segmentation, these models are first trained on normal data; then, an anomalous image is noised to an intermediate step, and the normal image is reconstructed through backward diffusion. Unlike traditional statistical methods, diffusion models do not rely on specific assumptions about the data or target anomalies, making them versatile for use across different domains. However, diffusion models typically assume access to normal data for training, limiting their applicability in realistic settings. In this paper, we propose novel robust denoising diffusion models for scenarios where only contaminated (i.e., a mix of normal and anomalous) unlabeled data is available. By casting maximum likelihood estimation of the data as a nonlinear regression problem, we reinterpret the denoising diffusion probabilistic model through a regression lens. Using robust regression, we derive a robust version of denoising diffusion probabilistic models. Our novel framework offers flexibility in constructing various robust diffusion models. Our experiments show that our approach outperforms current state of the art diffusion models, for unsupervised anomaly segmentation when only contaminated data is available. Our method outperforms existing diffusion-based approaches, achieving up to 8.08\% higher AUROC and 10.37\% higher AUPRC on MVTec datasets. The implementation code is available at: https://github.com/mehrdadmoradi124/RDDPM


A Comprehensive Framework for Uncertainty Quantification of Voxel-wise Supervised Models in IVIM MRI

arXiv.org Artificial Intelligence

Accurate estimation of intravoxel incoherent motion (IVIM) parameters from diffusion-weighted MRI remains challenging due to the ill-posed nature of the inverse problem and high sensitivity to noise, particularly in the perfusion compartment. In this work, we propose a probabilistic deep learning framework based on Deep Ensembles (DE) of Mixture Density Networks (MDNs), enabling estimation of total predictive uncertainty and decomposition into aleatoric (AU) and epistemic (EU) components. The method was benchmarked against non probabilistic neural networks, a Bayesian fitting approach and a probabilistic network with single Gaussian parametrization. Supervised training was performed on synthetic data, and evaluation was conducted on both simulated and an in vivo dataset. The reliability of the quantified uncertainties was assessed using calibration curves, output distribution sharpness, and the Continuous Ranked Probability Score (CRPS). MDNs produced more calibrated and sharper predictive distributions for the diffusion coefficient D and fraction f parameters, although slight overconfidence was observed in pseudo-diffusion coefficient D*. The Robust Coefficient of Variation (RCV) indicated smoother in vivo estimates for D* with MDNs compared to Gaussian model. Despite the training data covering the expected physiological range, elevated EU in vivo suggests a mismatch with real acquisition conditions, highlighting the importance of incorporating EU, which was allowed by DE. Overall, we present a comprehensive framework for IVIM fitting with uncertainty quantification, which enables the identification and interpretation of unreliable estimates. The proposed approach can also be adopted for fitting other physical models through appropriate architectural and simulation adjustments.