Uncertainty
Timely Clinical Diagnosis through Active Test Selection
Estévez, Silas Ruhrberg, Astorga, Nicolás, van der Schaar, Mihaela
There is growing interest in using machine learning (ML) to support clinical diagnosis, but most approaches rely on static, fully observed datasets and fail to reflect the sequential, resource-aware reasoning clinicians use in practice. Diagnosis remains complex and error prone, especially in high-pressure or resource-limited settings, underscoring the need for frameworks that help clinicians make timely and cost-effective decisions. We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design), a diagnostic framework that integrates Bayesian Experimental Design (BED) with large language models (LLMs) to better emulate real-world diagnostic reasoning. At each step, ACTMED selects the test expected to yield the greatest reduction in diagnostic uncertainty for a given patient. LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data. Clinicians can remain in the loop; reviewing test suggestions, interpreting intermediate outputs, and applying clinical judgment throughout. We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use. This represents a step toward transparent, adaptive, and clinician-aligned diagnostic systems that generalize across settings with reduced reliance on domain-specific data.
Surrogate Modeling and Explainable Artificial Intelligence for Complex Systems: A Workflow for Automated Simulation Exploration
Saves, Paul, Palar, Pramudita Satria, Robani, Muhammad Daffa, Verstaevel, Nicolas, Garouani, Moncef, Aligon, Julien, Gaudou, Benoit, Shimoyama, Koji, Morlier, Joseph
Complex systems are increasingly explored through simulation-driven engineering workflows that combine physics-based and empirical models with optimization and analytics. Despite their power, these workflows face two central obstacles: (1) high computational cost, since accurate exploration requires many expensive simulator runs; and (2) limited transparency and reliability when decisions rely on opaque blackbox components. We propose a workflow that addresses both challenges by training lightweight emulators on compact designs of experiments that (i) provide fast, low-latency approximations of expensive simulators, (ii) enable rigorous uncertainty quantification, and (iii) are adapted for global and local Explainable Artificial Intelligence (XAI) analyses. This workflow unifies every simulation-based complex-system analysis tool, ranging from engineering design to agent-based models for socio-environmental understanding. In this paper, we proposea comparative methodology and practical recommendations for using surrogate-based explainability tools within the proposed workflow. The methodology supports continuous and categorical inputs, combines global-effect and uncertainty analyses with local attribution, and evaluates the consistency of explanations across surrogate models, thereby diagnosing surrogate adequacy and guiding further data collection or model refinement. We demonstrate the approach on two contrasting case studies: a multidisciplinary design analysis of a hybrid-electric aircraft and an agent-based model of urban segregation. Results show that the surrogate model and XAI coupling enables large-scale exploration in seconds, uncovers nonlinear interactions and emergent behaviors, identifies key design and policy levers, and signals regions where surrogates require more data or alternative architectures.
Can Linear Probes Measure LLM Uncertainty?
Dakhmouche, Ramzi, Letellier, Adrien, Gorji, Hossein
Effective Uncertainty Quantification (UQ) represents a key aspect for reliable deployment of Large Language Models (LLMs) in automated decision-making and beyond. Yet, for LLM generation with multiple choice structure, the state-of-the-art in UQ is still dominated by the naive baseline given by the maximum softmax score. To address this shortcoming, we demonstrate that taking a principled approach via Bayesian statistics leads to improved performance despite leveraging the simplest possible model, namely linear regression. More precisely, we propose to train multiple Bayesian linear models, each predicting the output of a layer given the output of the previous one. Based on the obtained layer-level posterior distributions, we infer the global uncertainty level of the LLM by identifying a sparse combination of distributional features, leading to an efficient UQ scheme. Numerical experiments on various LLMs show consistent improvement over state-of-the-art baselines.
Local Markov Equivalence for PC-style Local Causal Discovery and Identification of Controlled Direct Effects
Loranchet, Timothée, Assaad, Charles K.
Understanding and identifying controlled direct effects (CDEs) is crucial across numerous scientific domains, including public health. While existing methods can identify these effects from causal directed acyclic graphs (DAGs), the true underlying structure is often unknown in practice. Essential graphs, which represent a Markov equivalence class of DAGs characterized by the same set of $d$-separations, provide a more practical and realistic alternative. However, learning the full essential graph is computationally intensive and typically depends on strong, untestable assumptions. In this work, we characterize a local class of graphs, defined relative to a target variable, that share a specific subset of $d$-separations, and introduce a graphical representation of this class, called the local essential graph (LEG). We then present LocPC, a novel algorithm designed to recover the LEG from an observed distribution using only local conditional independence tests. Building on LocPC, we propose LocPC-CDE, an algorithm that discovers the portion of the LEG that is both sufficient and necessary to identify a CDE, bypassing the need of retrieving the full essential graph. Compared to global methods, our algorithms require less conditional independence tests and operate under weaker assumptions while maintaining theoretical guarantees. We illustrate the effectiveness of our approach through simulation studies.
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Bai, Yatong, Casebeer, Jonah, Sojoudi, Somayeh, Bryan, Nicholas J.
We present Distributional RewArds for Generative OptimizatioN (DRAGON), a versatile framework for fine-tuning media generation models towards a desired outcome. Compared with traditional reinforcement learning with human feedback (RLHF) or pairwise preference approaches such as direct preference optimization (DPO), DRAGON is more flexible. It can optimize reward functions that evaluate either individual examples or distributions of them, making it compatible with a broad spectrum of instance-wise, instance-to-distribution, and distribution-to-distribution rewards. Leveraging this versatility, we construct novel reward functions by selecting an encoder and a set of reference examples to create an exemplar distribution. When cross-modal encoders such as CLAP are used, the reference may be of a different modality (text versus audio). Then, DRAGON gathers online and on-policy generations, scores them with the reward function to construct a positive demonstration set and a negative set, and leverages the contrast between the two finite sets to approximate distributional reward optimization. For evaluation, we fine-tune an audio-domain text-to-music diffusion model with 20 reward functions, including a custom music aesthetics model, CLAP score, Vendi diversity, and Frechet audio distance (FAD). We further compare instance-wise (per-song) and full-dataset FAD settings while ablating multiple FAD encoders and reference sets. Over all 20 target rewards, DRAGON achieves an 81.45% average win rate. Moreover, reward functions based on exemplar sets enhance generations and are comparable to model-based rewards. With an appropriate exemplar set, DRAGON achieves a 60.95% human-voted music quality win rate without training on human preference annotations. DRAGON is a new approach to designing and optimizing reward functions for improving human-perceived quality. Demos at https://ml-dragon.github.io/web
Enhancing failure prediction in nuclear industry: Hybridization of knowledge- and data-driven techniques
Saley, Amaratou Mahamadou, Moyaux, Thierry, Sekhari, Aïcha, Cheutet, Vincent, Danielou, Jean-Baptiste
The convergence of the Internet of Things (IoT) and Industry 4.0 has significantly enhanced data-driven methodologies within the nuclear industry, notably enhancing safety and economic efficiency. This advancement challenges the precise prediction of future maintenance needs for assets, which is crucial for reducing downtime and operational costs. However, the effectiveness of data-driven methodologies in the nuclear sector requires extensive domain knowledge due to the complexity of the systems involved. Thus, this paper proposes a novel predictive maintenance methodology that combines data-driven techniques with domain knowledge from a nuclear equipment. The methodological originality of this paper is located on two levels: highlighting the limitations of purely data-driven approaches and demonstrating the importance of knowledge in enhancing the performance of the predictive models. The applicative novelty of this work lies in its use within a domain such as a nuclear industry, which is highly restricted and ultrasensitive due to security, economic and environmental concerns. A detailed real-world case study which compares the current state of equipment monitoring with two scenarios, demonstrate that the methodology significantly outperforms purely data-driven methods in failure prediction. While purely data-driven methods achieve only a modest performance with a prediction horizon limited to 3 h and a F1 score of 56.36%, the hybrid approach increases the prediction horizon to 24 h and achieves a higher F1 score of 93.12%.
Decision-Making Amid Information-Based Threats in Sociotechnical Systems: A Review
Allred, Aaron R., Richardson, Erin E., Bostrom, Sarah R., Crum, James, Spencer, Cara, Tossell, Chad, Niemeyer, Richard E., Hirshfield, Leanne, Hayman, Allison P. A.
Technological systems increasingly mediate human information exchange, spanning interactions among humans as well as between humans and artificial agents. The unprecedented scale and reliance on information disseminated through these systems substantially expand the scope of information-based influence that can both enable and undermine sound decision-making. Consequently, understanding and protecting decision-making today faces growing challenges, as individuals and organizations must navigate evolving opportunities and information-based threats across varied domains and information environments. While these risks are widely recognized, research remains fragmented: work evaluating information-based threat phenomena has progressed largely in isolation from foundational studies of human information processing. In this review, we synthesize insights from both domains to identify shared cognitive mechanisms that mediate vulnerability to information-based threats and shape behavioral outcomes. Finally, we outline directions for future research aimed at integrating these perspectives, emphasizing the importance of such integration for mitigating human vulnerabilities and aligning human-machine representations.
Scientific Data Compression and Super-Resolution Sampling
Modern scientific simulations, observations, and large-scale experiments generate data at volumes that often exceed the limits of storage, processing, and analysis. This challenge drives the development of data reduction methods that efficiently manage massive datasets while preserving essential physical features and quantities of interest. In many scientific workflows, it is also crucial to enable data recovery from compressed representations - a task known as super-resolution - with guarantees on the preservation of key physical characteristics. A notable example is checkpointing and restarting, which is essential for long-running simulations to recover from failures, resume after interruptions, or examine intermediate results. In this work, we introduce a novel framework for scientific data compression and super-resolution, grounded in recent advances in learning exponential families. Our method preserves and quantifies uncertainty in physical quantities of interest and supports flexible trade-offs between compression ratio and reconstruction fidelity.
DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks
Bajwa, Waheed U., Gurbuzbalaban, Mert, Kutbay, Mustafa Ali, Zhu, Lingjiong, Zulqarnain, Muhammad
Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an $O(\sqrtη)$ neighborhood of the target distribution, where $η$ is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.
Bregman geometry-aware split Gibbs sampling for Bayesian Poisson inverse problems
Faye, Elhadji Cisse, Fall, Mame Diarra, Dobigeon, Nicolas, Barat, Eric
This paper proposes a novel Bayesian framework for solving Poisson inverse problems by devising a Monte Carlo sampling algorithm which accounts for the underlying non-Euclidean geometry. To address the challenges posed by the Poisson likelihood -- such as non-Lipschitz gradients and positivity constraints -- we derive a Bayesian model which leverages exact and asymptotically exact data augmentations. In particular, the augmented model incorporates two sets of splitting variables both derived through a Bregman divergence based on the Burg entropy. Interestingly the resulting augmented posterior distribution is characterized by conditional distributions which benefit from natural conjugacy properties and preserve the intrinsic geometry of the latent and splitting variables. This allows for efficient sampling via Gibbs steps, which can be performed explicitly for all conditionals, except the one incorporating the regularization potential. For this latter, we resort to a Hessian Riemannian Langevin Monte Carlo (HRLMC) algorithm which is well suited to handle priors with explicit or easily computable score functions. By operating on a mirror manifold, this Langevin step ensures that the sampling satisfies the positivity constraints and more accurately reflects the underlying problem structure. Performance results obtained on denoising, deblurring, and positron emission tomography (PET) experiments demonstrate that the method achieves competitive performance in terms of reconstruction quality compared to optimization- and sampling-based approaches.