Learning Graphical Models
Efficient Autoregressive Inference for Transformer Probabilistic Models
Hassan, Conor, Loka, Nasrulloh, Li, Cen-You, Huang, Daolang, Chang, Paul E., Yang, Yang, Silvestrin, Francesco, Kaski, Samuel, Acerbi, Luigi
Transformer-based models for amortized probabilistic inference, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many real-world applications, from signal interpolation to multi-column tabular predictions, require coherent joint distributions that capture dependencies between predictions. While purely autoregressive architectures efficiently generate such distributions, they sacrifice the flexible set-conditioning that makes these models powerful for meta-learning. Conversely, the standard approach to obtain joint distributions from set-based models requires expensive re-encoding of the entire augmented conditioning set at each autoregressive step. We introduce a causal autoregressive buffer that preserves the advantages of both paradigms. Our approach decouples context encoding from updating the conditioning set. The model processes the context once and caches it. A dynamic buffer then captures target dependencies: as targets are incorporated, they enter the buffer and attend to both the cached context and previously buffered targets. This enables efficient batched autoregressive generation and one-pass joint log-likelihood evaluation. A unified training strategy allows seamless integration of set-based and autoregressive modes at minimal additional cost. Across synthetic functions, EEG signals, cognitive models, and tabular data, our method matches predictive accuracy of strong baselines while delivering up to 20 times faster joint sampling. Our approach combines the efficiency of autoregressive generative models with the representational power of set-based conditioning, making joint prediction practical for transformer-based probabilistic models.
A unified Bayesian framework for adversarial robustness
Arce, Pablo G., Naveiro, Roi, Insua, David Ríos
The vulnerability of machine learning models to adversarial attacks remains a critical security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. However, these deterministic approaches do not account for uncertainty in the adversary's attack. While stochastic defenses placing a probability distribution on the adversary exist, they often lack statistical rigor and fail to make explicit their underlying assumptions. To resolve these issues, we introduce a formal Bayesian framework that models adversarial uncertainty through a stochastic channel, articulating all probabilistic assumptions. This yields two robustification strategies: a proactive defense enacted during training, aligned with adversarial training, and a reactive defense enacted during operations, aligned with adversarial purification. Several previous defenses can be recovered as limiting cases of our model. We empirically validate our methodology, showcasing the benefits of explicitly modeling adversarial uncertainty.
Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
de tinguy, Daria, Verbelen, Tim, Gamba, Emilio, Dhoedt, Bart
Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present a biologically inspired, Active Inference-based framework, Active Inference MAPping and Planning (AIMAPP). This model unifies mapping, localisation, and decision-making within a single generative model. Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episodic memory to guide behaviour. The agent builds and updates a sparse topological map online, learns state transitions dynamically, and plans actions by minimising Expected Free Energy. This allows it to balance goal-directed and exploratory behaviours. We implemented a ROS-compatible navigation system that is sensor and robot-agnostic, capable of integrating with diverse hardware configurations. It operates in a fully self-supervised manner, is resilient to drift, and supports both exploration and goal-directed navigation without any pre-training. We demonstrate robust performance in large-scale real and simulated environments against state-of-the-art planning models, highlighting the system's adaptability to ambiguous observations, environmental changes, and sensor noise. The model offers a biologically inspired, modular solution to scalable, self-supervised navigation in unstructured settings. AIMAPP is available at https://github.com/decide-ugent/AIMAPP.
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
Lu, Yuchen, Yang, Run, Zhang, Yichen, Yu, Shuguang, Dai, Runpeng, Wang, Ziwei, Xiang, Jiayi, E, Wenxin, Gao, Siran, Ruan, Xinyao, Huang, Yirui, Xi, Chenjing, Hu, Haibo, Fu, Yueming, Yu, Qinglan, Wei, Xiaobing, Gu, Jiani, Sun, Rui, Jia, Jiaxuan, Zhou, Fan
Large language models (LLMs) have advanced rapidly in recent years (Brown et al., 2020; Touvron et al., 2023), demonstrating remarkable progress in complex reasoning (Guo et al., 2025), fluent text generation, and even automated proof discovery (Yu et al., 2025). These advances have spurred growing adoption of LLMs across education, data science, and research, where they are increasingly used for tutoring, problem explanation, data analysis, and hypothesis formulation (Wu et al., 2021; Polu and Sutskever, 2020; Khan et al., 2023; Gao et al., 2023). However, despite their broad deployment in quantitative domains, the field of statistics, which forms the foundation of modern data-driven science, has received little attention in LLM evaluation. Statistics differs fundamentally from other quantitative disciplines. Rather than focusing on symbolic manipulation or fixed-form computation, it emphasizes reasoning under uncertainty, connecting probability theory, inference, regression, Bayesian analysis, multivariate methods, and asymptotic theory into a unified framework. Yet existing large-scale LLM evaluations rarely cover these competencies: statistical problems account for less than 3% of recent reasoning benchmarks (Paster et al., 2025), and when included, they are typically treated as isolated probability puzzles without structured categorization or coverage of inferential reasoning (Gao et al., 2024). This gap makes it impossible to rigorously assess whether LLMs can function as capable statisticians or support data-driven scientific discovery. To bridge this critical gap, we introduce StatEval, the first large-scale benchmark dedicated to evaluating large language models on statistical reasoning. With nearly 20,000 meticulously curated problems, StatEval covers the entire spectrum of statistics, from basic undergraduate exercises to advanced research-level challenges, captures the full 2 breadth and depth of the discipline, as illustrated in Figure 1.
Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction
Ahmed, Iftekhar, Chowdhury, Tanzil Ebad, Routh, Biggo Bushon, Tasmiya, Nafisa, Sakib, Shadman, Chowdhury, Adil Ahmed
Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.
On Uniformly Scaling Flows: A Density-Aligned Approach to Deep One-Class Classification
Zaid, Faried Abu, Katzke, Tim, Müller, Emmanuel, Neider, Daniel
Unsupervised anomaly detection is often framed around two widely studied paradigms. Deep one-class classification, exemplified by Deep SVDD, learns compact latent representations of normality, while density estimators realized by normalizing flows directly model the likelihood of nominal data. In this work, we show that uniformly scaling flows (USFs), normalizing flows with a constant Jacobian determinant, precisely connect these approaches. Specifically, we prove how training a USF via maximum-likelihood reduces to a Deep SVDD objective with a unique regularization that inherently prevents representational collapse. This theoretical bridge implies that USFs inherit both the density faithfulness of flows and the distance-based reasoning of one-class methods. We further demonstrate that USFs induce a tighter alignment between negative log-likelihood and latent norm than either Deep SVDD or non-USFs, and how recent hybrid approaches combining one-class objectives with VAEs can be naturally extended to USFs. Consequently, we advocate using USFs as a drop-in replacement for non-USFs in modern anomaly detection architectures. Empirically, this substitution yields consistent performance gains and substantially improved training stability across multiple benchmarks and model backbones for both image-level and pixel-level detection. These results unify two major anomaly detection paradigms, advancing both theoretical understanding and practical performance.
Efficient Bayesian Inference from Noisy Pairwise Comparisons
Aczel, Till, Theis, Lucas, Roger, Wattenhofer
Evaluating generative models is challenging because standard metrics often fail to reflect human preferences. Human evaluations are more reliable but costly and noisy, as participants vary in expertise, attention, and diligence. Pairwise comparisons improve consistency, yet aggregating them into overall quality scores requires careful modeling. Bradley-Terry-based methods update item scores from comparisons, but existing approaches either ignore rater variability or lack convergence guarantees, limiting robustness and interpretability. We introduce BBQ, a Bayesian Bradley-Terry variant that explicitly models rater quality, downweighting or removing unreliable participants, and provides guaranteed monotonic likelihood convergence through an Expectation-Maximization algorithm. Empirical results show that BBQ achieves faster convergence, well-calibrated uncertainty estimates, and more robust, interpretable rankings compared to baseline Bradley-Terry models, even with noisy or crowdsourced raters. This framework enables more reliable and cost-effective human evaluation of generative models.
Navigation and Exploration with Active Inference: from Biology to Industry
de Tinguy, Daria, Verbelen, Tim, Dhoedt, Bart
By building and updating internal cognitive maps, animals exhibit extraordinary navigation abilities in complex, dynamic environments. Inspired by these biological mechanisms, we present a real time robotic navigation system grounded in the Active Inference Framework (AIF). Our model incrementally constructs a topological map, infers the agent's location, and plans actions by minimising expected uncertainty and fulfilling perceptual goals without any prior training. Integrated into the ROS2 ecosystem, we validate its adaptability and efficiency across both 2D and 3D environments (simulated and real world), demonstrating competitive performance with traditional and state of the art exploration approaches while offering a biologically inspired navigation approach.
CausalDynamics: A large-scale benchmark for structural discovery of dynamical causal models
Herdeanu, Benjamin, Nathaniel, Juan, Roesch, Carla, Buch, Jatan, Ramien, Gregor, Haux, Johannes, Gentine, Pierre
Causal discovery for dynamical systems poses a major challenge in fields where active interventions are infeasible. Most methods used to investigate these systems and their associated benchmarks are tailored to deterministic, low-dimensional and weakly nonlinear time-series data. To address these limitations, we present CausalDynamics, a large-scale benchmark and extensible data generation framework to advance the structural discovery of dynamical causal models. Our benchmark consists of true causal graphs derived from thousands of both linearly and nonlinearly coupled ordinary and stochastic differential equations as well as two idealized climate models. We perform a comprehensive evaluation of state-of-the-art causal discovery algorithms for graph reconstruction on systems with noisy, confounded, and lagged dynamics. CausalDynamics consists of a plug-and-play, build-your-own coupling workflow that enables the construction of a hierarchy of physical systems. We anticipate that our framework will facilitate the development of robust causal discovery algorithms that are broadly applicable across domains while addressing their unique challenges. We provide a user-friendly implementation and documentation on https://kausable.github.io/CausalDynamics.
Prime Implicant Explanations for Reaction Feasibility Prediction
Weinbauer, Klaus, Phan, Tieu-Long, Stadler, Peter F., Gärtner, Thomas, Malhotra, Sagar
Machine learning models that predict the feasibility of chemical reactions have become central to automated synthesis planning. Despite their predictive success, these models often lack transparency and interpretability. We introduce a novel formulation of prime implicant explanations--also known as minimally sufficient reasons--tailored to this domain, and propose an algorithm for computing such explanations in small-scale reaction prediction tasks. Preliminary experiments demonstrate that our notion of prime implicant explanations conservatively captures the ground truth explanations. That is, such explanations often contain redundant bonds and atoms but consistently capture the molecular attributes that are essential for predicting reaction feasibility.