Goto

Collaborating Authors

 Bayesian Learning


Generative Bayesian Filtering and Parameter Learning

arXiv.org Machine Learning

Generative Bayesian Filtering (GBF) provides a powerful and flexible framework for performing posterior inference in complex nonlinear and non-Gaussian state-space models. Our approach extends Generative Bayesian Computation (GBC) to dynamic settings, enabling recursive posterior inference using simulation-based methods powered by deep neural networks. GBF does not require explicit density evaluations, making it particularly effective when observation or transition distributions are analytically intractable. To address parameter learning, we introduce the Generative-Gibbs sampler, which bypasses explicit density evaluation by iteratively sampling each variable from its implicit full conditional distribution. Such technique is broadly applicable and enables inference in hierarchical Bayesian models with intractable densities, including state-space models. We assess the performance of the proposed methodologies through both simulated and empirical studies, including the estimation of $α$-stable stochastic volatility models. Our findings indicate that GBF significantly outperforms existing likelihood-free approaches in accuracy and robustness when dealing with intractable state-space models.


Online Bayesian Experimental Design for Partially Observed Dynamical Systems

arXiv.org Machine Learning

Bayesian experimental design (BED) provides a principled framework for optimizing data collection, but existing approaches do not apply to crucial real-world settings such as dynamical systems with partial observability, where only noisy and incomplete observations are available. These systems are naturally modeled as state-space models (SSMs), where latent states mediate the link between parameters and data, making the likelihood -- and thus information-theoretic objectives like the expected information gain (EIG) -- intractable. In addition, the dynamical nature of the system requires online algorithms that update posterior distributions and select designs sequentially in a computationally efficient manner. We address these challenges by deriving new estimators of the EIG and its gradient that explicitly marginalize latent states, enabling scalable stochastic optimization in nonlinear SSMs. Our approach leverages nested particle filters (NPFs) for efficient online inference with convergence guarantees. Applications to realistic models, such as the susceptible-infected-recovered (SIR) and a moving source location task, show that our framework successfully handles both partial observability and online computation.


Higher-Order Causal Structure Learning with Additive Models

arXiv.org Machine Learning

Causal structure learning has long been the central task of inferring causal insights from data. Despite the abundance of real-world processes exhibiting higher-order mechanisms, however, an explicit treatment of interactions in causal discovery has received little attention. In this work, we focus on extending the causal additive model (CAM) to additive models with higher-order interactions. This second level of modularity we introduce to the structure learning problem is most easily represented by a directed acyclic hypergraph which extends the DAG. We introduce the necessary definitions and theoretical tools to handle the novel structure we introduce and then provide identifiability results for the hyper DAG, extending the typical Markov equivalence classes. We next provide insights into why learning the more complex hypergraph structure may actually lead to better empirical results. In particular, more restrictive assumptions like CAM correspond to easier-to-learn hyper DAGs and better finite sample complexity. We finally develop an extension of the greedy CAM algorithm which can handle the more complex hyper DAG search space and demonstrate its empirical usefulness in synthetic experiments.


Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics

arXiv.org Machine Learning

A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore it (recovery). These nonstationarities are captured by the Reducing or Gaining Unknown Efficacy (ROGUE) bandit framework, which models real-world settings such as behavioral health interventions. While existing algorithms can compute sublinear regret policies to optimize these settings, they may not provide sufficient exploration due to overemphasis on exploitation, limiting the ability to estimate population-level effects. This is a challenge of particular interest in micro-randomized trials (MRTs) that aid researchers in developing just-in-time adaptive interventions that have population-level effects while still providing personalized recommendations to individuals. In this paper, we first develop ROGUE-TS, a Thompson Sampling algorithm tailored to the ROGUE framework, and provide theoretical guarantees of sublinear regret. We then introduce a probability clipping procedure to balance personalization and population-level learning, with quantified trade-off that balances regret and minimum exploration probability. Validation on two MRT datasets concerning physical activity promotion and bipolar disorder treatment shows that our methods both achieve lower regret than existing approaches and maintain high statistical power through the clipping procedure without significantly increasing regret. This enables reliable detection of treatment effects while accounting for individual behavioral dynamics. For researchers designing MRTs, our framework offers practical guidance on balancing personalization with statistical validity.


CLAX: Fast and Flexible Neural Click Models in JAX

arXiv.org Artificial Intelligence

CLAX is a JAX-based library that implements classic click models using modern gradient-based optimization. While neural click models have emerged over the past decade, complex click models based on probabilistic graphical models (PGMs) have not systematically adopted gradient-based optimization, preventing practitioners from leveraging modern deep learning frameworks while preserving the interpretability of classic models. CLAX addresses this gap by replacing EM-based optimization with direct gradient-based optimization in a numerically stable manner. The framework's modular design enables the integration of any component, from embeddings and deep networks to custom modules, into classic click models for end-to-end optimization. We demonstrate CLAX's efficiency by running experiments on the full Baidu-ULTR dataset comprising over a billion user sessions in $\approx$ 2 hours on a single GPU, orders of magnitude faster than traditional EM approaches. CLAX implements ten classic click models, serving both industry practitioners seeking to understand user behavior and improve ranking performance at scale and researchers developing new click models. CLAX is available at: https://github.com/philipphager/clax


A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications

arXiv.org Artificial Intelligence

In this study, a modular, data-free pipeline for multi-label intention recognition is proposed for agentic AI applications in transportation. Unlike traditional intent recognition systems that depend on large, annotated corpora and often struggle with fine-grained, multi-label discrimination, our approach eliminates the need for costly data collection while enhancing the accuracy of multi-label intention understanding. Specifically, the overall pipeline, named DMTC, consists of three steps: 1) using prompt engineering to guide large language models (LLMs) to generate diverse synthetic queries in different transport scenarios; 2) encoding each textual query with a Sentence-T5 model to obtain compact semantic embeddings; 3) training a lightweight classifier using a novel online focal-contrastive (OFC) loss that emphasizes hard samples and maximizes inter-class separability. The applicability of the proposed pipeline is demonstrated in an agentic AI application in the maritime transportation context. Extensive experiments show that DMTC achieves a Hamming loss of 5.35% and an AUC of 95.92%, outperforming state-of-the-art multi-label classifiers and recent end-to-end SOTA LLM-based baselines. Further analysis reveals that Sentence-T5 embeddings improve subset accuracy by at least 3.29% over alternative encoders, and integrating the OFC loss yields an additional 0.98% gain compared to standard contrastive objectives. In conclusion, our system seamlessly routes user queries to task-specific modules (e.g., ETA information, traffic risk evaluation, and other typical scenarios in the transportation domain), laying the groundwork for fully autonomous, intention-aware agents without costly manual labelling.


Uncovering Bugs in Formal Explainers: A Case Study with PyXAI

arXiv.org Artificial Intelligence

Formal explainable artificial intelligence (XAI) offers unique theoretical guarantees of rigor when compared to other non-formal methods of explainability. However, little attention has been given to the validation of practical implementations of formal explainers. This paper develops a novel methodology for validating formal explainers and reports on the assessment of the publicly available formal explainer PyXAI. The paper documents the existence of incorrect explanations computed by PyXAI on most of the datasets analyzed in the experiments, thereby confirming the importance of the proposed novel methodology for the validation of formal explainers.


Leveraging Discrete Function Decomposability for Scientific Design

arXiv.org Artificial Intelligence

In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization -- which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization -- finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized "search distribution" -- a learned generative model -- for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.


Addressing prior dependence in hierarchical Bayesian modeling for PTA data analysis I: Methodology and implementation

arXiv.org Machine Learning

Complex inference tasks, such as those encountered in Pulsar Timing Array (PTA) data analysis, rely on Bayesian frameworks. The high-dimensional parameter space and the strong interdependencies among astrophysical, pulsar noise, and nuisance parameters introduce significant challenges for efficient learning and robust inference. These challenges are emblematic of broader issues in decision science, where model over-parameterization and prior sensitivity can compromise both computational tractability and the reliability of the results. We address these issues in the framework of hierarchical Bayesian modeling by introducing a reparameterization strategy. Our approach employs Normalizing Flows (NFs) to decorrelate the parameters governing hierarchical priors from those of astrophysical interest. The use of NF-based mappings provides both the flexibility to realize the reparametrization and the tractability to preserve proper probability densities. We further adopt i-nessai, a flow-guided nested sampler, to accelerate exploration of complex posteriors. This unified use of NFs improves statistical robustness and computational efficiency, providing a principled methodology for addressing hierarchical Bayesian inference in PTA analysis.


Beyond Maximum Likelihood: Variational Inequality Estimation for Generalized Linear Models

arXiv.org Machine Learning

Generalized linear models (GLMs) are fundamental tools for statistical modeling, with maximum likelihood estimation (MLE) serving as the classical method for parameter inference. While MLE performs well in canonical GLMs, it can become computationally inefficient near the true parameter value. In more general settings with non-canonical or fully general link functions, the resulting optimization landscape is often non-convex, non-smooth, and numerically unstable. To address these challenges, we investigate an alternative estimator based on solving the variational inequality (VI) formulation of the GLM likelihood equations, originally proposed by Juditsky and Nemirovski as an alternative for solving nonlinear least-squares problems. Unlike their focus on algorithmic convergence in monotone settings, we analyze the VI approach from a statistical perspective, comparing it systematically with the MLE. We also extend the theory of VI estimators to a broader class of link functions, including non-monotone cases satisfying a strong Minty condition, and show that it admits weaker smoothness requirements than MLE, enabling faster, more stable, and less locally trapped optimization. Theoretically, we establish both non-asymptotic estimation error bounds and asymptotic normality for the VI estimator, and further provide convergence guarantees for fixed-point and stochastic approximation algorithms. Numerical experiments show that the VI framework preserves the statistical efficiency of MLE while substantially extending its applicability to more challenging GLM settings.