Bayesian Inference
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
Zakirov, Bahti, Tkaฤik, Gaลกper
Normative and task-driven theories offer powerful top-down explanations for biological systems, yet the goals of quantitatively arbitrating between competing theories, and utilizing them as inductive biases to improve data-driven fits of real biological datasets are prohibitively laborious, and often impossible. To this end, we introduce a Bayesian meta-learning framework designed to automatically convert raw functional predictions from normative theories into tractable probabilistic models. We employ adaptive deep kernel Gaussian processes, meta-learning a kernel on synthetic data generated from a normative theory. This Theory-Informed Kernel specifies a probabilistic model representing the theory predictions -- usable for both fitting data and rigorously validating the theory. As a demonstration, we apply our framework to the early visual system, using efficient coding as our normative theory. We show improved response prediction accuracy in ex vivo recordings of mouse retinal ganglion cells stimulated by natural scenes compared to conventional data-driven baselines, while providing well-calibrated uncertainty estimates and interpretable representations. Using exact Bayesian model selection, we also show that our informed kernel can accurately infer the degree of theory-match from data, confirming faithful encapsulation of theory structure. This work provides a more general, scalable, and automated approach for integrating theoretical knowledge into data-driven scientific inquiry in neuroscience and beyond.
Surjective Independence of Causal Influences for Local Bayesian Network Structures
Drury, Kieran, Barons, Martine J., Smith, Jim Q.
The very expressiveness of Bayesian networks can introduce fresh challenges due to the large number of relationships they often model. In many domains, it is thus often essential to supplement any available data with elicited expert judgements. This in turn leads to two key challenges: the cognitive burden of these judgements is often very high, and there are a very large number of judgements required to obtain a full probability model. We can mitigate both issues by introducing assumptions such as independence of causal influences (ICI) on the local structures throughout the network, restricting the parameter space of the model. However, the assumption of ICI is often unjustified and overly strong. In this paper, we introduce the surjective independence of causal influences (SICI) model which relaxes the ICI assumption and provides a more viable, practical alternative local structure model that facilitates efficient Bayesian network parameterisation.
SAIP: A Plug-and-Play Scale-adaptive Module in Diffusion-based Inverse Problems
Solving inverse problems with diffusion models has shown promise in tasks such as image restoration. A common approach is to formulate the problem in a Bayesian framework and sample from the posterior by combining the prior score with the likelihood score. Since the likelihood term is often intractable, estimators like DPS, DMPS, and $ฯ$GDM are widely adopted. However, these methods rely on a fixed, manually tuned scale to balance prior and likelihood contributions. Such a static design is suboptimal, as the ideal balance varies across timesteps and tasks, limiting performance and generalization. To address this issue, we propose SAIP, a plug-and-play module that adaptively refines the scale at each timestep without retraining or altering the diffusion backbone. SAIP integrates seamlessly into existing samplers and consistently improves reconstruction quality across diverse image restoration tasks, including challenging scenarios.
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Ma, Yuchen, Frauen, Dennis, Javurek, Emil, Feuerriegel, Stefan
Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including for back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and explicitly train models to perform in-context learning in these settings. We show that CausalFM achieves competitive in-context learning performance even when compared to baselines that are specifically trained for the task at hand. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.
Learning with Local Search MCMC Layers
Vivier-Ardisson, Germain, Blondel, Mathieu, Parmentier, Axel
Integrating combinatorial optimization layers into neural networks has recently attracted significant research interest. However, many existing approaches lack theoretical guarantees or fail to perform adequately when relying on inexact solvers. This is a critical limitation, as many operations research problems are NP-hard, often necessitating the use of neighborhood-based local search heuristics. These heuristics iteratively generate and evaluate candidate solutions based on an acceptance rule. In this paper, we introduce a theoretically-principled approach for learning with such inexact combinatorial solvers. Inspired by the connection between simulated annealing and Metropolis-Hastings, we propose to transform problem-specific neighborhood systems used in local search heuristics into proposal distributions, implementing MCMC on the combinatorial space of feasible solutions. This allows us to construct differentiable combinatorial layers and associated loss functions. Replacing an exact solver by a local search strongly reduces the computational burden of learning on many applications. We demonstrate our approach on a large-scale dynamic vehicle routing problem with time windows.
Do Repetitions Matter? Strengthening Reliability in LLM Evaluations
Gonzalez, Miguel Angel Alvarado, Hernandez, Michelle Bruno, Perez, Miguel Angel Peรฑaloza, Orozco, Bruno Lopez, Soto, Jesus Tadeo Cruz, Malagon, Sandra
LLM leaderboards often rely on single stochastic runs, but how many repetitions are required for reliable conclusions remains unclear. We re-evaluate eight state-of-the-art models on the AI4Math Benchmark with three independent runs per setting. Using mixed-effects logistic regression, domain-level marginal means, rank-instability analysis, and run-to-run reliability, we assessed the value of additional repetitions. Our findings shows that Single-run leaderboards are brittle: 10/12 slices (83\%) invert at least one pairwise rank relative to the three-run majority, despite a zero sign-flip rate for pairwise significance and moderate overall interclass correlation. Averaging runs yields modest SE shrinkage ($\sim$5\% from one to three) but large ranking gains; two runs remove $\sim$83\% of single-run inversions. We provide cost-aware guidance for practitioners: treat evaluation as an experiment, report uncertainty, and use $\geq 2$ repetitions under stochastic decoding. These practices improve robustness while remaining feasible for small teams and help align model comparisons with real-world reliability.
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
Costa, Davi Bastos, Vicente, Renato
Mafia is a social deduction game where informed mafia compete against uninformed townsfolk. Its asymmetry of information and reliance on theory-of-mind reasoning mirror real-world multi-agent scenarios, making it a useful testbed for evaluating the social intelligence of large language models (LLMs). To support a systematic study, we introduce Mini-Mafia: a simplified four-player variant with one mafioso, one detective, and two villagers. We set the mafioso to kill a villager and the detective to investigate the mafioso during the night, reducing the game to a single day phase of discussion and voting. This setup isolates three interactive capabilities through role-specific win conditions: the mafioso must deceive, the villagers must detect deception, and the detective must effectively disclose information. To measure these skills, we have LLMs play against each other, creating the Mini-Mafia Benchmark: a two-stage framework that first estimates win rates within fixed opponent configurations, then aggregates performance across them using standardized scoring. Built entirely from model interactions without external data, the benchmark evolves as new models are introduced, with each one serving both as a new opponent and as a subject of evaluation. Our experiments reveal counterintuitive results, including cases where smaller models outperform larger ones. Beyond benchmarking, Mini-Mafia enables quantitative study of emergent multi-agent dynamics such as name bias and last-speaker advantage. It also contributes to AI safety by generating training data for deception detectors and by tracking models' deception capabilities against human baselines.
Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification
Patrone, Paul, Kearsley, Anthony
Machine learning (ML) is often viewed as a powerful data analysis tool that is easy to learn because of its black-box nature. Yet this very nature also makes it difficult to quantify confidence in predictions extracted from ML models, and more fundamentally, to understand how such models are mathematical abstractions of training data. The goal of this paper is to unravel these issues and their connections to uncertainty quantification (UQ) by pursuing a line of reasoning motivated by diagnostics. In such settings, prevalence - i.e. the fraction of elements in class - is often of inherent interest. Here we analyze the many interpretations of prevalence to derive a level-set theory of classification, which shows that certain types of self-consistent ML models are equivalent to class-conditional probability distributions. We begin by studying the properties of binary Bayes optimal classifiers, recognizing that their boundary sets can be reinterpreted as level-sets of pairwise density ratios. By parameterizing Bayes classifiers in terms of the prevalence, we then show that they satisfy important monotonicity and class-switching properties that can be used to deduce the density ratios without direct access to the boundary sets. Moreover, this information is sufficient for tasks such as constructing the multiclass Bayes-optimal classifier and estimating inherent uncertainty in the class assignments. In the multiclass case, we use these results to deduce normalization and self-consistency conditions, the latter being equivalent to the law of total probability for classifiers. We also show that these are necessary conditions for arbitrary ML models to have valid probabilistic interpretations. Throughout we demonstrate how this analysis informs the broader task of UQ for ML via an uncertainty propagation framework.
A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
Zhao, Lingjun, Daumรฉ, Hal III
Faithful free-text explanations are important to ensure transparency in high-stakes AI decision-making contexts, but they are challenging to generate by language models and assess by humans. In this paper, we present a measure for Prediction-EXplanation (PEX) consistency, by extending the concept of weight of evidence. This measure quantifies how much a free-text explanation supports or opposes a prediction, serving as an important aspect of explanation faithfulness. Our analysis reveals that more than 62% explanations generated by large language models lack this consistency. We show that applying direct preference optimization improves the consistency of generated explanations across three model families, with improvement ranging from 43.1% to 292.3%. Furthermore, we demonstrate that optimizing this consistency measure can improve explanation faithfulness by up to 9.7%.
Overcoming Over-Fitting in Constraint Acquisition via Query-Driven Interactive Refinement
Balafas, Vasileios, Tsouros, Dimos, Ploskas, Nikolaos, Stergiou, Kostas
Manual modeling in Constraint Programming is a substantial bottleneck, which Constraint Acquisition (CA) aims to automate. However, passive CA methods are prone to over-fitting, often learning models that include spurious global constraints when trained on limited data, while purely active methods can be query-intensive. We introduce a hybrid CA framework specifically designed to address the challenge of over-fitting in CA. Our approach integrates passive learning for initial candidate generation, a query-driven interactive refinement phase that utilizes probabilistic confidence scores (initialized by machine learning priors) to systematically identify over-fitted constraints, and a specialized subset exploration mechanism to recover valid substructures from rejected candidates. A final active learning phase ensures model completeness. Extensive experiments on diverse benchmarks demonstrate that our interactive refinement phase is crucial for achieving high target model coverage and overall model accuracy from limited examples, doing so with manageable query complexity. This framework represents a substantial advancement towards robust and practical constraint acquisition in data-limited scenarios.