Goto

Collaborating Authors

 Bayesian Inference


Learning Discrete Bayesian Networks with Hierarchical Dirichlet Shrinkage

arXiv.org Machine Learning

Discrete Bayesian networks (DBNs) provide a broadly useful framework for modeling dependence structures in multivariate categorical data. There is a vast literature on methods for inferring conditional probabilities and graphical structure in DBNs, but data sparsity and parametric assumptions are major practical issues. In this article, we detail a comprehensive Bayesian framework for learning DBNs. First, we propose a hierarchical prior for the conditional probabilities that enables complicated interactions between parent variables and stability in sparse regimes. We give a novel Markov chain Monte Carlo (MCMC) algorithm utilizing parallel Langevin proposals to generate exact posterior samples, avoiding the pitfalls of variational approximations. Moreover, we verify that the full conditional distribution of the concentration parameters is log-concave under mild conditions, facilitating efficient sampling. We then propose two methods for learning network structures, including parent sets, Markov blankets, and DAGs, from categorical data. The first cycles through individual edges each MCMC iteration, whereas the second updates the entire structure as a single step. We evaluate the accuracy, power, and MCMC performance of our methods on several simulation studies. Finally, we apply our methodology to uncover prognostic network structure from primary breast cancer samples.


Bayesian Parametric Matrix Models: Principled Uncertainty Quantification for Spectral Learning

arXiv.org Machine Learning

Scientific machine learning increasingly uses spectral methods to understand physical systems. Current spectral learning approaches provide only point estimates without uncertainty quantification, limiting their use in safety-critical applications where prediction confidence is essential. Parametric matrix models have emerged as powerful tools for scientific machine learning, achieving exceptional performance by learning governing equations. However, their deterministic nature limits deployment in uncertainty quantification applications. We introduce Bayesian parametric matrix models (B-PMMs), a principled framework that extends PMMs to provide uncertainty estimates while preserving their spectral structure and computational efficiency. B-PMM addresses the fundamental challenge of quantifying uncertainty in matrix eigenvalue problems where standard Bayesian methods fail due to the geometric constraints of spectral decomposition. The theoretical contributions include: (i) adaptive spectral decomposition with regularized matrix perturbation bounds that characterize eigenvalue uncertainty propagation, (ii) structured variational inference algorithms using manifold-aware matrix-variate Gaussian posteriors that respect Hermitian constraints, and (iii) finite-sample calibration guarantees with explicit dependence on spectral gaps and problem conditioning. Experimental validation across matrix dimensions from 5x5 to 500x500 with perfect convergence rates demonstrates that B-PMMs achieve exceptional uncertainty calibration (ECE < 0.05) while maintaining favorable scaling. The framework exhibits graceful degradation under spectral ill-conditioning and provides reliable uncertainty estimates even in near-degenerate regimes. The proposed framework supports robust spectral learning in uncertainty-critical domains and lays the groundwork for broader Bayesian spectral machine learning.


On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models

arXiv.org Artificial Intelligence

We investigate individual fairness in generative probabilistic classifiers by analysing the robustness of posterior inferences to perturbations in private features. Building on established results in robustness analysis, we hypothesise a correlation between robustness and predictive accuracy, specifically, instances exhibiting greater robustness are more likely to be classified accurately. We empirically assess this hypothesis using a benchmark of fourteen datasets with fairness concerns, employing Bayesian networks as the underlying generative models. To address the computational complexity associated with robustness analysis over multiple private features with Bayesian networks, we reformulate the problem as a most probable explanation task in an auxiliary Markov random field. Our experiments confirm the hypothesis about the correlation, suggesting novel directions to mitigate the traditional trade-off between fairness and accuracy.


Physical Complexity of a Cognitive Artifact

arXiv.org Artificial Intelligence

There are currently two well established domains for studying general problem solving. The first describes strategies used by humans on both experimental and real-world tasks. Human problem solving is captured through a number of frameworks including skill acquisition [1] and automaticity [2], the application of expert knowledge [3, 4], the use of heuristics [5, 6], reinforcement learning and conditioning [7, 8], Bayesian inference [9, 10], analogy making [11, 12], collective intelligence and cognition [13, 14], simulation intelligence [15], the use of external representations [16, 17] and the synergy of mind and matter through exbodiment [18]. The second domain, computational problem solving, investigates algorithms that enable computers to tackle problems effectively . Within this domain, two branches especially pertinent to the present question are: (i) computational complexity theory, which analyzes the resources (time, memory, etc.) required to solve problems as functions of input size, typically in the asymptotic limit; and (ii) the study of search algorithms, which seeks efficient solutions to specific tasks (e.g., games and puzzles) by exploiting the combinatorial structure of state spaces, often via heuristics [19-22].


Deriving the Scaled-Dot-Function via Maximum Likelihood Estimation and Maximum Entropy Approach

arXiv.org Artificial Intelligence

In this paper, we present a maximum likelihood estimation approach to determine the value vector in transformer models. We model the sequence of value vectors, key vectors, and the query vector as a sequence of Gaussian distributions. The variance in each Gaussian distribution depends on the time step, the corresponding key vector, and the query vector. The mean value in each Gaussian distribution depends on the time step, and the corresponding value vector. This analysis may offer a new explanation of the scaled-dot-product function or softmax function used in transformer architectures [1]. Another explanation, inspired by [4], is based on the maximum entropy approach in natural language processing [5]. In this approach, a query vector and key vectors are used to derive the feature functions for the maximum entropy model.


Strategic Concealment of Environment Representations in Competitive Games

arXiv.org Artificial Intelligence

This paper investigates the strategic concealment of environment representations used by players in competitive games. We consider a defense scenario in which one player (the Defender) seeks to infer and exploit the representation used by the other player (the Attacker). The interaction between the two players is modeled as a Bayesian game: the Defender infers the Attacker's representation from its trajectory and places barriers to obstruct the Attacker's path towards its goal, while the Attacker obfuscates its representation type to mislead the Defender. We solve for the Perfect Bayesian Nash Equilibrium via a bilinear program that integrates Bayesian inference, strategic planning, and belief manipulation. Simulations show that purposeful concealment naturally emerges: the Attacker randomizes its trajectory to manipulate the Defender's belief, inducing suboptimal barrier selections and thereby gaining a strategic advantage.


Predictable Compression Failures: Why Language Models Actually Hallucinate

arXiv.org Machine Learning

Large language models perform near-Bayesian inference yet violate permutation invariance on exchangeable data. We resolve this by showing transformers minimize expected conditional description length (cross-entropy) over orderings, $\mathbb{E}_ฯ€[\ell(Y \mid ฮ“_ฯ€(X))]$, which admits a Kolmogorov-complexity interpretation up to additive constants, rather than the permutation-invariant description length $\ell(Y \mid X)$. This makes them Bayesian in expectation, not in realization. We derive (i) a Quantified Martingale Violation bound showing order-induced deviations scale as $O(\log n)$ with constants; (ii) the Expectation-level Decompression Law linking information budgets to reliability for Bernoulli predicates; and (iii) deployable planners (B2T/RoH/ISR) for answer/abstain decisions. Empirically, permutation dispersion follows $a+b\ln n$ (Qwen2-7B $b \approx 0.377$, Llama-3.1-8B $b \approx 0.147$); permutation mixtures improve ground-truth likelihood/accuracy; and randomized dose-response shows hallucinations drop by $\sim 0.13$ per additional nat. A pre-specified audit with a fixed ISR=1.0 achieves near-0\% hallucinations via calibrated refusal at 24\% abstention. The framework turns hallucinations into predictable compression failures and enables principled information budgeting.


TabStruct: Measuring Structural Fidelity of Tabular Data

arXiv.org Artificial Intelligence

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, $\textbf{global utility}$, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present $\textbf{TabStruct}$, a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results. Code is available at https://github.com/SilenceX12138/TabStruct.


Memory Unscented Particle Filter for 6-DOF Tactile Localization

arXiv.org Artificial Intelligence

This paper addresses 6-DOF (degree-of-freedom) tactile localization, i.e. the pose estimation of tridimensional objects given tactile measurements. This estimation problem is fundamental for the operation of autonomous robots that are often required to manipulate and grasp objects whose pose is a-priori unknown. The nature of tactile measurements, the strict time requirements for real-time operation and the multimodality of the involved probability distributions pose remarkable challenges and call for advanced nonlinear filtering techniques. Following a Bayesian approach, this paper proposes a novel and effective algorithm, named Memory Unscented Particle Filter (MUPF), which solves the 6-DOF localization problem recursively in real-time by only exploiting contact point measurements. MUPF combines a modified particle filter that incorporates a sliding memory of past measurements to better handle multimodal distributions, along with the unscented Kalman filter that moves the particles towards regions of the search space that are more likely with the measurements. The performance of the proposed MUPF algorithm has been assessed both in simulation and on a real robotic system equipped with tactile sensors (i.e., the iCub humanoid robot). The experiments show that the algorithm provides accurate and reliable localization even with a low number of particles and, hence, is compatible with real-time requirements.


Adaptive-GraphSketch: Real-Time Edge Anomaly Detection via Multi-Layer Tensor Sketching and Temporal Decay

arXiv.org Artificial Intelligence

Anomaly detection in dynamic graphs is essential for identifying malicious activities, fraud, and unexpected behaviors in real-world systems such as cybersecurity and power grids. However, existing approaches struggle with scalability, probabilistic interpretability, and adaptability to evolving traffic patterns. In this paper, we propose ADAPTIVE-GRAPHSKETCH, a lightweight and scalable framework for real-time anomaly detection in streaming edge data. Our method integrates temporal multi-tensor sketching with Count-Min Sketch using Conservative Update (CMS-CU) to compactly track edge frequency patterns with bounded memory, while mitigating hash collision issues. We incorporate Bayesian inference for probabilistic anomaly scoring and apply Exponentially Weighted Moving Average (EWMA) for adaptive thresholding tuned to burst intensity. Extensive experiments on four real-world intrusion detection datasets demonstrate that ADAPTIVE-GRAPHSKETCH outperforms state-of-the-art baselines such as ANOEDGE-G/L, MIDAS-R, and F-FADE, achieving up to 6.5% AUC gain on CIC-IDS2018 and up to 15.6% on CIC-DDoS2019, while processing 20 million edges in under 3.4 seconds using only 10 hash functions. Our results show that ADAPTIVE-GRAPHSKETCH is practical and effective for fast, accurate anomaly detection in large-scale streaming graphs. Keywords: Anomaly Detection, Streaming, Real-time, Dynamic Graphs, Edge Streams, Tensor Sketching