Bayesian Learning
ACE: Adapting sampling for Counterfactual Explanations
Guerrero, Margarita A., Rojas, Cristian R.
Counterfactual Explanations (CFEs) interpret machine learning models by identifying the smallest change to input features needed to change the model's prediction to a desired output. For classification tasks, CFEs determine how close a given sample is to the decision boundary of a trained classifier. Existing methods are often sample-inefficient, requiring numerous evaluations of a black-box model -- an approach that is both costly and impractical when access to the model is limited. We propose Adaptive sampling for Counterfactual Explanations (ACE), a sample-efficient algorithm combining Bayesian estimation and stochastic optimization to approximate the decision boundary with fewer queries. By prioritizing informative points, ACE minimizes evaluations while generating accurate and feasible CFEs. Extensive empirical results show that ACE achieves superior evaluation efficiency compared to state-of-the-art methods, while maintaining effectiveness in identifying minimal and actionable changes.
Staged Event Trees for Transparent Treatment Effect Estimation
Varando, Gherardo, Leonelli, Manuele, Cerdร -Bautista, Jordi, Sitokonstantinou, Vasileios, Camps-Valls, Gustau
Average and conditional treatment effects are fundamental causal quantities used to evaluate the effectiveness of treatments in various critical applications, including clinical settings and policy-making. Beyond the gold-standard estimators from randomized trials, numerous methods have been proposed to estimate treatment effects using observational data. In this paper, we provide a novel characterization of widely used causal inference techniques within the framework of staged event trees, demonstrating their capacity to enhance treatment effect estimation. These models offer a distinct advantage due to their interpretability, making them particularly valuable for practical applications. We implement classical estimators within the framework of staged event trees and illustrate their capabilities through both simulation studies and real-world applications. Furthermore, we showcase how staged event trees explicitly and visually describe when standard causal assumptions, such as positivity, hold, further enhancing their practical utility.
Nonparametric inference under shape constraints: past, present and future
We survey the field of nonparametric inference under shape constraints, providing a historical overview and a perspective on its current state. An outlook and some open problems offer thoughts on future directions. 1 Introduction. Traditionally, we think of statistical methods as being divided into parametric approaches, which can be restrictive, but where estimation is typically straightforward (e.g. using maximum likelihood), and nonparametric methods, which are more flexible but often require careful choices of tuning parameters. Nonparametric inference under shape constraints sits somewhere in the middle, seeking in some ways the best of both worlds. The origins of the field are often traced to Grenander (1956), who proved that there exists a unique maximum likelihood estimator (MLE) of a decreasing density on the non-negative half-line (and was able to characterise it explicitly).
BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories under Spatio-Temporal Vector Fields
Zhang, Rui-Yang, Moss, Henry B., Astfalck, Lachlan, Cripps, Edward, Leslie, David S.
We introduce a formal active learning methodology for guiding the placement of Lagrangian observers to infer time-dependent vector fields -- a key task in oceanography, marine science, and ocean engineering -- using a physics-informed spatio-temporal Gaussian process surrogate model. The majority of existing placement campaigns either follow standard `space-filling' designs or relatively ad-hoc expert opinions. A key challenge to applying principled active learning in this setting is that Lagrangian observers are continuously advected through the vector field, so they make measurements at different locations and times. It is, therefore, important to consider the likely future trajectories of placed observers to account for the utility of candidate placement locations. To this end, we present BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories. We observe noticeable benefits of BALLAST-aided sequential observer placement strategies on both synthetic and high-fidelity ocean current models.
Uncertainty Quantification for Regression using Proper Scoring Rules
Fishkov, Alexander, Schweighofer, Kajetan, Ielanskyi, Mykyta, Kotelevskii, Nikita, Guizani, Mohsen, Panov, Maxim
Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, these advances were focused on classification, while extending these ideas to regression remains challenging. In this work, we introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores. We derive closed-form expressions for the resulting uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models. In particular, the derived uncertainty measures naturally decompose into aleatoric and epistemic components. The framework recovers popular regression UQ measures based on predictive variance and differential entropy. Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.
Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
Jin, Gaojie, Yi, Xinping, Huang, Xiaowei
Within the PAC-Bayesian framework, the Gibbs classifier (defined on a posterior $Q$) and the corresponding $Q$-weighted majority vote classifier are commonly used to analyze the generalization performance. However, there exists a notable lack in theoretical research exploring the certified robustness of majority vote classifier and its interplay with generalization. In this study, we develop a generalization error bound that possesses a certified robust radius for the smoothed majority vote classifier (i.e., the $Q$-weighted majority vote classifier with smoothed inputs); In other words, the generalization bound holds under any data perturbation within the certified robust radius. As a byproduct, we find that the underpinnings of both the generalization bound and the certified robust radius draw, in part, upon weight spectral norm, which thereby inspires the adoption of spectral regularization in smooth training to boost certified robustness. Utilizing the dimension-independent property of spherical Gaussian inputs in smooth training, we propose a novel and inexpensive spectral regularizer to enhance the smoothed majority vote classifier. In addition to the theoretical contribution, a set of empirical results is provided to substantiate the effectiveness of our proposed method.
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
Chen, Tianlang, Xu, Minkai, Leskovec, Jure, Ermon, Stefano
Diffusion large language models (dLLMs) have shown great potential in large-scale language modeling, and there is an increasing interest in further improving the capacity to solve complex problems by guiding the reasoning process step by step. Common practice for autoregressive language models typically learns a process reward model with dense annotation for each intermediate step. However, this is challenging for dLLMs where the generation is in an any-order fashion and intermediate states are partially masked sentences. To this end, in this paper, we propose reward-free guidance (RFG), a principled method for guiding the reasoning trajectory of dLLMs without explicit process reward. The key idea of RFG is to parameterize the process reward by log-likelihood ratios of the enhanced and reference dLLMs, where the enhanced model can be easily obtained by any off-the-shelf dLLM that has been post-trained with reinforcement learning (RL) or supervised fine-tuning (SFT). We provide theoretical justification that RFG induces the reward-guided sampling distribution with no additional reward. We conduct comprehensive experiments on four challenging mathematical reasoning and code generation benchmarks using a diverse suite of dLLMs enhanced with various post-training methods. RFG consistently yields significant improvements across all tasks and model types, achieving accuracy gains of up to 9.2%. These findings establish RFG as a general training-free framework that scales test-time reasoning without reliance on external reward models. By scaling up mask-predict pretraining on large-scale corpora through bidirectional computation, dLLMs have shown surprisingly competitive or even superior performance over autoregressive (AR) model baselines (Prabhudesai et al., 2025). Despite the impressive advancements, the current success of dLLMs is primarily limited to pre-training or continue-training on a specific domain, with limited exploration in test-time computation and alignment.
Crowdsourcing Without People: Modelling Clustering Algorithms as Experts
Lorentz, Jordyn E. A., Clark, Katharine M.
This paper introduces mixsemble, an ensemble method that adapts the Dawid-Skene model to aggregate predictions from multiple model-based clustering algorithms. Unlike traditional crowdsourcing, which relies on human labels, the framework models the outputs of clustering algorithms as noisy annotations. Experiments on both simulated and real-world datasets show that, although the mixsemble is not always the single top performer, it consistently approaches the best result and avoids poor outcomes. This robustness makes it a practical alternative when the true data structure is unknown, especially for non-expert users.
Learning to Condition: A Neural Heuristic for Scalable MPE Inference
Malhotra, Brij, Arya, Shivvrat, Rahman, Tahrima, Gogate, Vibhav Giridhar
We introduce learning to condition (L2C), a scalable, data-driven framework for accelerating Most Probable Explanation (MPE) inference in Probabilistic Graphical Models (PGMs), a fundamentally intractable problem. L2C trains a neural network to score variable-value assignments based on their utility for conditioning, given observed evidence. To facilitate supervised learning, we develop a scalable data generation pipeline that extracts training signals from the search traces of existing MPE solvers. The trained network serves as a heuristic that integrates with search algorithms, acting as a conditioning strategy prior to exact inference or as a branching and node selection policy within branch-and-bound solvers. We evaluate L2C on challenging MPE queries involving high-treewidth PGMs. Experiments show that our learned heuristic significantly reduces the search space while maintaining or improving solution quality over state-of-the-art methods.
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
Luo, Gongxu, Li, Loka, Chen, Guangyi, Dai, Haoyue, Zhang, Kun
Interventional causal discovery seeks to identify causal relations by leveraging distributional changes introduced by interventions, even in the presence of latent confounders. Beyond the spurious dependencies induced by latent confounders, we highlight a common yet often overlooked challenge in the problem due to post-treatment selection, in which samples are selectively included in datasets after interventions. This fundamental challenge widely exists in biological studies; for example, in gene expression analysis, both observational and interventional samples are retained only if they meet quality control criteria (e.g., highly active cells). Neglecting post-treatment selection may introduce spurious dependencies and distributional changes under interventions, which can mimic causal responses, thereby distorting causal discovery results and challenging existing causal formulations. To address this, we introduce a novel causal formulation that explicitly models post-treatment selection and reveals how its differential reactions to interventions can distinguish causal relations from selection patterns, allowing us to go beyond traditional equivalence classes toward the underlying true causal structure. We then characterize its Markov properties and propose a Fine-grained Interventional equivalence class, named FI-Markov equivalence, represented by a new graphical diagram, F-PAG. Finally, we develop a provably sound and complete algorithm, F-FCI, to identify causal relations, latent confounders, and post-treatment selection up to $\mathcal{FI}$-Markov equivalence, using both observational and interventional data. Experimental results on synthetic and real-world datasets demonstrate that our method recovers causal relations despite the presence of both selection and latent confounders.