Goto

Collaborating Authors

 Bayesian Inference


Failure Prediction from Limited Hardware Demonstrations

arXiv.org Artificial Intelligence

Prediction of failures in real-world robotic systems either requires accurate model information or extensive testing. Partial knowledge of the system model makes simulation-based failure prediction unreliable. Moreover, obtaining such demonstrations is expensive, and could potentially be risky for the robotic system to repeatedly fail during data collection. This work presents a novel three-step methodology for discovering failures that occur in the true system by using a combination of a limited number of demonstrations from the true system and the failure information processed through sampling-based testing of a model dynamical system. Given a limited budget $N$ of demonstrations from true system and a model dynamics (with potentially large modeling errors), the proposed methodology comprises of a) exhaustive simulations for discovering algorithmic failures using the model dynamics; b) design of initial $N_1$ demonstrations of the true system using Bayesian inference to learn a Gaussian process regression (GPR)-based failure predictor; and c) iterative $N - N_1$ demonstrations of the true system for updating the failure predictor. To illustrate the efficacy of the proposed methodology, we consider: a) the failure discovery for the task of pushing a T block to a fixed target region with UR3E collaborative robot arm using a diffusion policy; and b) the failure discovery for an F1-Tenth racing car tracking a given raceline under an LQR control policy.


pyhgf: A neural network library for predictive coding

arXiv.org Artificial Intelligence

Bayesian models of cognition have gained considerable traction in computational neuroscience and psychiatry. Their scopes are now expected to expand rapidly to artificial intelligence, providing general inference frameworks to support embodied, adaptable, and energy-efficient autonomous agents. A central theory in this domain is predictive coding, which posits that learning and behaviour are driven by hierarchical probabilistic inferences about the causes of sensory inputs. Biological realism constrains these networks to rely on simple local computations in the form of precision-weighted predictions and prediction errors. This can make this framework highly efficient, but its implementation comes with unique challenges on the software development side. Embedding such models in standard neural network libraries often becomes limiting, as these libraries' compilation and differentiation backends can force a conceptual separation between optimization algorithms and the systems being optimized. This critically departs from other biological principles such as self-monitoring, self-organisation, cellular growth and functional plasticity. In this paper, we introduce \texttt{pyhgf}: a Python package backed by JAX and Rust for creating, manipulating and sampling dynamic networks for predictive coding. We improve over other frameworks by enclosing the network components as transparent, modular and malleable variables in the message-passing steps. The resulting graphs can implement arbitrary computational complexities as beliefs propagation. But the transparency of core variables can also translate into inference processes that leverage self-organisation principles, and express structure learning, meta-learning or causal discovery as the consequence of network structural adaptation to surprising inputs. The code, tutorials and documentation are hosted at: https://github.com/ilabcode/pyhgf.


DFM: Interpolant-free Dual Flow Matching

arXiv.org Machine Learning

Continuous normalizing flows (CNFs) can model data distributions with expressive infinite-length architectures. But this modeling involves computationally expensive process of solving an ordinary differential equation (ODE) during maximum likelihood training. Recently proposed flow matching (FM) framework allows to substantially simplify the training phase using a regression objective with the interpolated forward vector field. In this paper, we propose an interpolant-free dual flow matching (DFM) approach without explicit assumptions about the modeled vector field. DFM optimizes the forward and, additionally, a reverse vector field model using a novel objective that facilitates bijectivity of the forward and reverse transformations. Our experiments with the SMAP unsupervised anomaly detection show advantages of DFM when compared to the CNF trained with either maximum likelihood or FM objectives with the state-of-the-art performance metrics.


Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

arXiv.org Machine Learning

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We propose a pseudo maximum likelihood estimator and study its asymptotic normality in the context of increasingly imbalanced populations relative to an increasingly large sample size. We provide theoretical guarantees for the introduced estimator. Additionally, we compute the optimal downsampling rate using a criterion that balances statistical accuracy and computational efficiency. Our numerical experiments, conducted on both synthetic and empirical data, further validate our theoretical results, and demonstrate that the introduced estimator outperforms commonly available alternatives.


Linear-cost unbiased posterior estimates for crossed effects and matrix factorization models via couplings

arXiv.org Machine Learning

In recent years, unbiased Markov Chain Monte Carlo via couplings (UMCMC) has emerged as a promising framework to remove bias from MCMC estimates, thus potentially allowing for early stopping, simplifying the convergence diagnostic process and facilitating parallelization (Glynn and Rhee, 2014; Jacob et al., 2020). In UMCMC, coupled chains are run for a random number of iterations (at least up to coalescence) and their values are combined to produce unbiased estimates. A natural question that arises is whether this class of estimates incurs a greater computational cost than conventional MCMC based on simple ergodic averages and to quantify this potential difference. Framing the question differently, one may ask whether it is possible to devise UMCMC methods with computational cost matching top performing MCMCs, while enjoying the above mentioned benefits. On a different line of research, various works showed how carefully designed blocked Gibbs Samplers (BGSs), i.e. Gibbs sampling schemes that update entire blocks of coordinates jointly, can achieve state-of-the-art performances for sampling from the posterior distributions of various challenging high-dimensional Bayesian models, such as non-nested models with crossed dependencies (Papaspiliopoulos et al., 2019, 2023). In particular, BGSs achieve linear computational costs in the number of parameters and observations in asymptotic regimes where both diverge to infinity.


Causal machine learning for predicting treatment outcomes

arXiv.org Machine Learning

Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes. Here, we present how methods from causal ML can be used to understand the effectiveness of treatments, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that allows for estimating individualized treatment effects, as well as personalized predictions of potential patient outcomes under different treatments. This offers granular insights into when treatments are effective, so that decision-making in patient care can be personalized to individual patient profiles. We further discuss how causal ML can be used in combination with both clinical trial data as well as real-world data such as clinical registries and electronic health records. We finally provide recommendations for the reliable use of causal ML in medicine. First published in Nature Medicine, 30, 958-968 (2024) by Springer Nature. Assessing the effectiveness of treatments is crucial to ensure patient safety and personalize patient care. Recent innovations in machine learning (ML) offer new, data-driven methods to estimate treatment effects from data. This branch in ML is commonly referred to as causal ML as it aims to predict a causal quantity, namely, the patient outcomes due to treatment [1]. Causal ML can be used in order to estimate treatment effects from both experimental data obtained through randomized controlled trials (RCTs) and observational data obtained from clinical registries, electronic health records, and other real-world data (RWD) sources to generate clinical evidence. A key strength of causal ML is that it allows to estimate individualized treatment effects, as well as to make personalized predictions of potential patient outcomes under different treatments.


Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference

Neural Information Processing Systems

We propose a general purpose Bayesian inference algorithm for expensive likelihoods, replacing the stochastic term in the Langevin equation with a deterministic density gradient term. The particle density is evaluated from the current particle positions using a Normalizing Flow (NF), which is differentiable and has good generalization properties in high dimensions. We take advantage of NF preconditioning and NF based Metropolis-Hastings updates for a faster convergence. We show on various examples that the method is competitive against state of the art sampling methods.


Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models

Neural Information Processing Systems

Recent advances in optical voltage sensors have brought us closer to a critical goal in cellular neuroscience: imaging the full spatiotemporal voltage on a dendritic tree. However, current sensors and imaging approaches still face significant limitations in SNR and sampling frequency; therefore statistical denoising and interpolation methods remain critical for understanding single-trial spatiotemporal dendritic voltage dynamics. Previous denoising approaches were either based on an inadequate linear voltage model or scaled poorly to large trees. Here we introduce a scalable fully Bayesian approach. We develop a generative nonlinear model that requires few parameters per compartment of the cell but is nonetheless flexible enough to sample realistic spatiotemporal data.


Minimum Stein Discrepancy Estimators

Neural Information Processing Systems

When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities.


Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing Systems

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherent nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL \cite{reddy2019sqil} and IQ-Learn \cite{garg2021iq} emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks.