Goto

Collaborating Authors

 Directed Networks


Diffusion Models Meet Contextual Bandits

Neural Information Processing Systems

Efficient online decision-making in contextual bandits is challenging, as methods without informative priors often suffer from computational or statistical inefficiencies. In this work, we leverage pre-trained diffusion models as expressive priors to capture complex action dependencies and develop a practical algorithm that efficiently approximates posteriors under such priors, enabling both fast updates and sampling. Empirical results demonstrate the effectiveness and versatility of our approach across diverse contextual bandit settings.


ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Neural Information Processing Systems

Directed acyclic graph (DAG) learning is a central task in structure discovery and causal inference. Although the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. We address the difficult task of quantifying graph uncertainty by developing a Bayesian variational inference framework based on novel, provably valid distributions that have support directly on the space of sparse DAGs. These distributions, which we use to define our prior and variational posterior, are induced by a projection operation that maps an arbitrary continuous distribution onto the space of sparse weighted acyclic adjacency matrices. While this projection is combinatorial, it can be solved efficiently using recent continuous reformulations of acyclicity constraints. We empirically demonstrate that our method, ProDAG, can outperform state-of-the-art alternatives in both accuracy and uncertainty quantification.


MoleBridge: Synthetic Space Projecting with Discrete Markov Bridges

Neural Information Processing Systems

Molecular synthetic space projecting is a critical technique in de novo molecular design, which aims to rectify molecules without synthesizability guarantee by converting them into synthetic postfix notations. However, the vast synthesizable chemical space and the discrete data modalities involved pose significant challenges to postfix notation conversion benchmarking. In this paper, we exploit conditional probability transitions in discrete state space and introduce MoleBridge, a deep generative model built on the Markov bridge approach for designing postfix notations of molecular synthesis pathways. MoleBridge consists of two iterative optimizations: i) Autoregressive extending of notation tokens from molecular graphs, and ii) generation of discrete reaction postfix notations through Markov bridge, where noisy token blocks are progressively denoised over multi-step iterations. For the challenging second iteration, which demands sensitivity to incorrect generative probability paths within intricate chemical spaces, we employ a thinking and denoising separation approach to denoise. Empirically, we find that MoleBridge is capable of accurately predicting synthesis pathways while exhibiting excellent performance in a variety of application scenarios.


From Indicators to Insights: Diversity-Optimized for Medical Series-Text Decoding via LLMs

Neural Information Processing Systems

Medical time-series analysis differs fundamentally from general ones by requiring specialized domain knowledge to interpret complex signals and clinical context. Large language models (LLMs) hold great promise for augmenting medical timeseries analysis by complementing raw series with rich contextual knowledge drawn from biomedical literature and clinical guidelines. However, realizing this potential depends on precise and meaningful prompts that guide the LLM to key information. Yet, determining what constitutes effective prompt content remains non-trivial--especially in medical settings where signal interpretation often hinges on subtle, expert-defined decision-making indicators. To this end, we propose InDiGO, a knowledge-aware evolutionary learning framework that integrates clinical signals and decision-making indicators through iterative optimization. Across four medical benchmarks, InDiGO consistently outperforms prior methods.


Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

Neural Information Processing Systems

This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.


Agnostic Active Learning Is Always Better Than Passive Learning

Neural Information Processing Systems

We provide the first sharp characterization of the optimal first-order query complexity of agnostic active learning, and propose a new general active learning algorithm which achieves it. Remarkably, the optimal query complexity admits a leading term which is always strictly smaller than the sample complexity of passive supervised learning (by a factor proportional to the best-in-class error rate). This was not previously known to be possible. For comparison, in all previous general analyses, the leading term exhibits an additional factor, such as the disagreement coefficient or related complexity measures, and therefore only provides improvements over passive learning in restricted cases. The present work completely removes such factors from the leading term, implying that every concept class benefits from active learning in the non-realizable case. Whether such benefits are possible has been the driving question underlying the past two decades of research on the theory of agnostic active learning. This work finally settles this fundamental question.


Action-BED: Task-Driven Bayesian Experimental Design with Singly Intractable Objectives

arXiv.org Machine Learning

Bayesian experimental design (BED) has traditionally been based on maximising expected uncertainty reductions from prior to posterior. A major shortfall of this approach is that it leads to doubly intractable objectives that are difficult to optimise, while customising them to particular downstream tasks of interest can also be difficult. Following first principles decision theory, we demonstrate that BED can alternatively be formulated in terms of an expected future loss (EFL) on downstream actions, providing a simple and naturally task-driven framework. Critically, we then show that all such EFLs can be rearranged into singly intractable objectives that can be jointly optimised with respect to both the design policy and a downstream action policy using stochastic gradients, an approach we refer to as ACTION-BED. This formulation further sidesteps the need for any explicit posterior or marginal likelihood estimation and is naturally implicit, requiring only the ability to sample from the joint model over model parameters and data, and evaluate the downstream loss function. It thus allows design policies to be learned more effectively, efficiently, and simply than existing methods, while providing easy customisation to different downstream tasks and losses.


Bayesian Model Averaging under Predictor Redundancy via Density-Ratio Posterior Compression

arXiv.org Machine Learning

Bayesian model averaging in support-indexed regression induces a posterior distribution over active predictor supports. Under predictor redundancy, posterior mass can spread across many nearly interchangeable supports, making exact-support summaries unstable or hard to interpret even when prediction is stable. We study how to report an already fitted Bayesian model averaging posterior without changing the Bayesian target. A report uses hard or soft regions of support space, and its compressed reporting law is compared with the reference posterior through an explicit density ratio. This ratio gives computable total-variation and Kullback--Leibler distortion, bounds for bounded predictive summaries, retained-mass diagnostics, and fallback-weight diagnostics. The framework covers fixed hard regions, metric-ball regions, posterior-cluster regions, and pooled-pruned region dictionaries. We prove exact error formulas and validation bounds for these region reports, and give conditions under which a few regions can replace a long list of individual supports. In simulations, our region reports often give shorter and clearer summaries while preserving the main posterior information, and the density-ratio diagnostics show when too much information has been lost.


Adversarial observations in probabilistic State-Space Models for robust Reinforcement Learning

arXiv.org Machine Learning

Machine learning (ML) systems increasingly support decision-making in high-stakes settings such as robotics, autonomous systems, finance, homeland security, and critical infrastructure protection. In these domains, robustness and reliability are essential because failures can translate into physical harm, financial loss, or operational breakdown (García and Fernández, 2015). A recurring weakness is that many ML pipelines implicitly assume that training and deployment data are independent and identically distributed (i.i.d.), even though real deployments often violate this assumption through sensor drift, changing environments, and distribution shift (Quiñonero-Candela et al., 2009). In security-relevant contexts, this problem is amplified because adversaries can deliberately manipulate observations, rewards, or the environment to induce targeted shifts and drive the system toward failure (Barreno et al., 2006; Biggio and Roli, 2018; Vassilev et al., 2024). These concerns motivate the relatively recent field of adversarial machine learning (AML), which studies how malicious perturbations can break learning systems and how to design defenses against them (Biggio and Roli, 2018; Goodfellow, Shlens and Szegedy, 2015).


Leveraging tails for adaptation

arXiv.org Machine Learning

A central goal in nonparametric statistics is adaptation: the ability of an estimator to perform simultaneously and optimally across a wide variety of settings with little to no tuning. When inference is carried out over a class of functional spaces, it is desirable that the estimator automatically adapts to unknown features of these spaces, such as smoothness, geometry, sparsity or other finer structural properties. A large body of literature has focused on adaptation: Lepski's method Lepski ı [1990, 1991], thresholding Donoho et al. [1995] and model selection Barron et al. [1999] are amongst the most well-known nonBayesian approaches. Bayesian methods, on the other hand, have a natural ability to achieve adaptation, as we discuss in more detail below, by choosing prior distributions that are flexible enough to achieve this task (one possibility is for instance to draw certain prior parameters at random in a hierarchical Bayes fashion). Recently, motivated by the remarkable empirical success of deep learning methods, there has been a growing interest in understanding how neural networks can automatically learn structural parameters, such as smoothness of functions or'effective' dimensions, for instance in regression settings exhibiting a compositional structure as in Schmidt-Hieber [2020], Kohler and Langer [2021] or for data lying on geometric structures (e.g.