Goto

Collaborating Authors

 Uncertainty


Bayesian Network Structure Discovery Using Large Language Models

arXiv.org Artificial Intelligence

Understanding probabilistic relationships among variables is crucial for analyzing complex systems. Traditional structure learning methods often require extensive observational data and incur high computational costs. Recent studies have explored using large language models (LLMs) for structure learning, but most treat LLMs as auxiliary tools for pre-processing or post-processing, leaving the core learning process data-driven. In this work, we propose a unified framework for Bayesian network structure discovery that places LLMs at the center, supporting both data-free and data-aware settings. In the data-free case, we introduce \textbf{PromptBN} to query LLMs with metadata and efficiently uncover valid probabilistic relationships. When observational data are available, we introduce \textbf{ReActBN}, which integrates the ReAct reasoning paradigm with structure scores such as the Bayesian Information Criterion (BIC) for iterative refinement. Unlike prior methods that offload refinement to external algorithms, our framework maintains the LLM actively in the loop throughout the discovery process. Experiments demonstrate that our method significantly outperforms both existing LLM-based approaches and traditional data-driven algorithms, particularly in the low- or no-data scenario. Code is publicly available at {\texttt{\textcolor{magenta}{https://github.com/sherryzyh/prompt2bn}}}.


Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet

arXiv.org Artificial Intelligence

Achieving both accurate and interpretable classification of motor imagery EEG remains a key challenge in brain computer interface (BCI) research. This paper compares a transparent fuzzy reasoning approach (ANFIS-FBCSP-PSO) with a deep learning benchmark (EEGNet) using the BCI Competition IV-2a dataset. The ANFIS pipeline combines filter bank common spatial pattern feature extraction with fuzzy IF-THEN rules optimized via particle swarm optimization, while EEGNet learns hierarchical spatial temporal representations directly from raw EEG data. In within-subject experiments, the fuzzy neural model performed better (68.58 percent +/- 13.76 percent accuracy, kappa = 58.04 percent +/- 18.43), while in cross-subject (LOSO) tests, the deep model exhibited stronger generalization (68.20 percent +/- 12.13 percent accuracy, kappa = 57.33 percent +/- 16.22). The study provides practical guidance for selecting MI-BCI systems according to design goals: interpretability or robustness across users. Future investigations into transformer based and hybrid neuro symbolic frameworks are expected to advance transparent EEG decoding.


FGO MythBusters: Explaining how Kalman Filter variants achieve the same performance as FGO in navigation applications

arXiv.org Artificial Intelligence

Sliding window-factor graph optimization (SW-FGO) has gained more and more attention in navigation research due to its robust approximation to non-Gaussian noises and nonlinearity of measuring models. There are lots of works focusing on its application performance compared to extended Kalman filter (EKF) but there is still a myth at the theoretical relationship between the SW-FGO and EKF. In this paper, we find the necessarily fair condition to connect SW-FGO and Kalman filter variants (KFV) (e.g., EKF, iterative EKF (IEKF), robust EKF (REKF) and robust iterative EKF (RIEKF)). Based on the conditions, we propose a recursive FGO (Re-FGO) framework to represent KFV under SW-FGO formulation. Under explicit conditions (Markov assumption, Gaussian noise with L2 loss, and a one-state window), Re-FGO regenerates exactly to EKF/IEKF/REKF/RIEKF, while SW-FGO shows measurable benefits in nonlinear, non-Gaussian regimes at a predictable compute cost. Finally, after clarifying the connection between them, we highlight the unique advantages of SW-FGO in practical phases, especially on numerical estimation and deep learning integration. The code and data used in this work is open sourced at https://github.com/Baoshan-Song/KFV-FGO-Comparison.


Riemannian Consistency Model

arXiv.org Artificial Intelligence

Consistency models are a class of generative models that enable few-step generation for diffusion and flow matching models. While consistency models have achieved promising results on Euclidean domains like images, their applications to Riemannian manifolds remain challenging due to the curved geometry. In this work, we propose the Riemannian Consistency Model (RCM), which, for the first time, enables few-step consistency modeling while respecting the intrinsic manifold constraint imposed by the Riemannian geometry. Leveraging the covariant derivative and exponential-map-based parameterization, we derive the closed-form solutions for both discrete- and continuous-time training objectives for RCM. We then demonstrate theoretical equivalence between the two variants of RCM: Riemannian consistency distillation (RCD) that relies on a teacher model to approximate the marginal vector field, and Riemannian consistency training (RCT) that utilizes the conditional vector field for training. We further propose a simplified training objective that eliminates the need for the complicated differential calculation. Finally, we provide a unique kinematics perspective for interpreting the RCM objective, offering new theoretical angles. Through extensive experiments, we manifest the superior generative quality of RCM in few-step generation on various non-Euclidean manifolds, including flat-tori, spheres, and the 3D rotation group SO(3).


Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases

arXiv.org Artificial Intelligence

Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.


Partial Trace-Class Bayesian Neural Networks

arXiv.org Machine Learning

Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architectures of partial trace-class Bayesian neural networks (PaTraC BNNs) that enable uncertainty quantification comparable to standard BNNs but use significantly fewer Bayesian parameters. These PaTraC BNNs have computational and statistical advantages over standard Bayesian neural networks in terms of speed and memory requirements. Our proposed methodology therefore facilitates reliable, robust, and scalable uncertainty quantification in neural networks. The three architectures build on trace-class neural network priors which induce an ordering of the neural network parameters, and are thus a natural choice in our framework. In a numerical simulation study, we verify the claimed benefits, and further illustrate the performance of our proposed methodology on a real-world dataset.


Few-Shot Multimodal Medical Imaging: A Theoretical Framework

arXiv.org Machine Learning

Medical imaging relies heavily on large, labeled datasets. But, unfortunately, they are not always easily accessible in clinical settings. Additionally, many practitioners often face various structural obstacles like limited data availability, fragmented data systems, and unbalanced datasets. These barriers often lead to the increased diagnostic uncertainty, underrepresentation of certain conditions, reduced model robustness, and biased diagnostic decisions. In response to these challenges, approaches such as transfer learning, meta-learning, and multimodal fusion have made great strides. However, they still need a solid theoretical justification for why they succeed or fail in situations where data is scarce. To address this gap, we propose a unified theoretical framework that characterizes learning and inference under low-resource medical imaging conditions. We first formalize the learning objective under few-shot conditions and compute sample complexity constraints to estimate the smallest quantity of data needed to achieve clinically reliable accuracy. Then based on ideas from PAC-learning and PAC-Bayesian theory, we explain how multimodal integration encourages generalization and quantifies uncertainty under sparse supervision. We further propose a formal metric for explanation stability, offering interpretability guarantees under low-data conditions. Taken together, the proposed framework establishes a principled foundation for constructing dependable, data-efficient diagnostic systems by jointly characterizing sample efficiency, uncertainty quantification, and interpretability in a unified theoretical setting.


Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

arXiv.org Machine Learning

We extend several recent results providing symmetry-based guarantees for variational inference (VI) with location-scale families. VI approximates a target density~$p$ by the best match $q^*$ in a family $Q$ of tractable distributions that in general does not contain $p$. It is known that VI can recover key properties of $p$, such as its mean and correlation matrix, when $p$ and $Q$ exhibit certain symmetries and $q^*$ is found by minimizing the reverse Kullback-Leibler divergence. We extend these guarantees in two important directions. First, we provide symmetry-based guarantees for a broader family of divergences, highlighting the properties of variational objectives under which VI provably recovers the mean and correlation matrix. Second, we obtain further guarantees for VI when the target density $p$ exhibits even and elliptical symmetries in some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry. We illustrate these theoretical results in a number of experimental settings.


Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

arXiv.org Machine Learning

Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through nonparametric estimation, but do not accommodate clustered data structures or provide uncertainty quantification. We introduce Gradient Boosted Mixed Models (GBMixed), a framework and algorithm that extends boosting to jointly estimate mean and variance components via likelihood-based gradients. In addition to nonparametric mean estimation, the method models both random effects and residual variances as potentially covariate-dependent functions using flexible base learners such as regression trees or splines, enabling nonparametric estimation while maintaining interpretability. Simulations and real-world applications demonstrate accurate recovery of variance components, calibrated prediction intervals, and improved predictive accuracy relative to standard linear mixed models and nonparametric methods. GBMixed provides heteroscedastic uncertainty quantification and introduces boosting for heterogeneous random effects. This enables covariate-dependent shrinkage for cluster-specific predictions to adapt between population and cluster-level data. Under standard causal assumptions, the framework enables estimation of heterogeneous treatment effects with reliable uncertainty quantification.


Variational Visual Question Answering for Uncertainty-Aware Selective Prediction

arXiv.org Artificial Intelligence

Despite remarkable progress in recent years, vision language models (VLMs) remain prone to overconfidence and hallucinations on tasks such as Visual Question Answering (VQA) and Visual Reasoning. Bayesian methods can potentially improve reliability by helping models selectively predict, that is, models respond only when they are sufficiently confident. Unfortunately, Bayesian methods are often assumed to be costly and ineffective for large models, and so far there exists little evidence to show otherwise, especially for multimodal applications. Here, we show the effectiveness and competitive edge of variational Bayes for selective prediction in VQA for the first time. We build on recent advances in variational methods for deep learning and propose an extension called "Variational VQA". This method improves calibration and yields significant gains for selective prediction on VQA and Visual Reasoning, particularly when the error tolerance is low ($\leq 1\%$). Often, just one posterior sample can yield more reliable answers than those obtained by models trained with AdamW. In addition, we propose a new risk-averse selector that outperforms standard sample averaging by considering the variance of predictions. Overall, we present compelling evidence that variational learning is a viable option to make large VLMs safer and more trustworthy.