Goto

Collaborating Authors

 Bayesian Learning


Individualized Multi-Treatment Response Curves Estimation using RBF-net with Shared Neurons

arXiv.org Machine Learning

Estimation of heterogeneous treatment effects from observational data has become an important problem. It plays a crucial role in determining the individualized causal effects of a treatment, which then leads to a personalized assignment of optimal treatment (Wendling et al., 2018; Rekkas et al., 2020). Estimation of such heterogeneity however requires reasonable representations from each treatment subgroup. With the increasing availability of large-scale health outcome data such as electronic health records (EHR) data in recent years, it has become possible to develop individualized treatment strategies efficiently. This led to the development of several novel statistical methods, primarily tailored for binary treatment scenarios (Wendling et al., 2018; Cheng et al., 2020), with some accommodating multiple treatment settings (Brown et al., 2020; Chalkou et al., 2021). Most of these approaches are specifically designed for estimating population average treatment effects (ATEs) (Van Der Laan and Rubin, 2006; Chernozhukov et al., 2018; McCaffrey et al., 2013) and more recently, methods are being developed to estimate conditional average treatment effects (CATEs) (Taddy et al., 2016; Wager and Athey, 2018; Künzel et al., 2019; Nie and Wager, 2021). Here, we tackle a generic problem of heterogeneous treatment effect or CATE estimation in a multi-treatment setting, where the treatment responses may share some commonalities.


SMC Is All You Need: Parallel Strong Scaling

arXiv.org Artificial Intelligence

In the general framework of Bayesian inference, the target distribution can only be evaluated up-to a constant of proportionality. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably delivers parallel strong scaling, i.e. the time complexity (and per-node memory) remains bounded if the number of asynchronous processes is allowed to grow. More precisely, the pSMC has a theoretical convergence rate of MSE$ = O(1/NR)$, where $N$ denotes the number of communicating samples in each processor and $R$ denotes the number of processors. In particular, for suitably-large problem-dependent $N$, as $R \rightarrow \infty$ the method converges to infinitesimal accuracy MSE$=O(\varepsilon^2)$ with a fixed finite time-complexity Cost$=O(1)$ and with no efficiency leakage, i.e. computational complexity Cost$=O(\varepsilon^{-2})$. A number of Bayesian inference problems are taken into consideration to compare the pSMC and MCMC methods.


Improved Evidential Deep Learning via a Mixture of Dirichlet Distributions

arXiv.org Artificial Intelligence

This paper explores a modern predictive uncertainty estimation approach, called evidential deep learning (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their strong empirical performance, recent studies by Bengs et al. identify a fundamental pitfall of the existing methods: the learned epistemic uncertainty may not vanish even in the infinite-sample limit. We corroborate the observation by providing a unifying view of a class of widely used objectives from the literature. Our analysis reveals that the EDL methods essentially train a meta distribution by minimizing a certain divergence measure between the distribution and a sample-size-independent target distribution, resulting in spurious epistemic uncertainty. Grounded in theoretical principles, we propose learning a consistent target distribution by modeling it with a mixture of Dirichlet distributions and learning via variational inference. Afterward, a final meta distribution model distills the learned uncertainty from the target model. Experimental results across various uncertainty-based downstream tasks demonstrate the superiority of our proposed method, and illustrate the practical implications arising from the consistency and inconsistency of learned epistemic uncertainty.


Gaussian Mixture Models for Affordance Learning using Bayesian Networks

arXiv.org Artificial Intelligence

Affordances are fundamental descriptors of relationships between actions, objects and effects. They provide the means whereby a robot can predict effects, recognize actions, select objects and plan its behavior according to desired goals. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences. Models exist for learning the structure and the parameters of a Bayesian Network encoding this knowledge. Although Bayesian Networks are capable of dealing with uncertainty and redundancy, previous work considered complete observability of the discrete sensory data, which may lead to hard errors in the presence of noise. In this paper we consider a probabilistic representation of the sensors by Gaussian Mixture Models (GMMs) and explicitly taking into account the probability distribution contained in each discrete affordance concept, which can lead to a more correct learning.


Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

arXiv.org Artificial Intelligence

Best arm identification (BAI) addresses the challenge of finding the optimal arm in a bandit environment (Lattimore and Szepesvári, 2020), with wide-ranging applications in online advertising, drug discovery or hyperparameter tuning. BAI is commonly approached through two primary paradigms: fixed-confidence and fixed-budget. In the fixed-confidence setting (Even-Dar et al., 2006; Kaufmann et al., 2016), the objective is to find the optimal arm with a pre-specified confidence level. Conversely, fixed-budget BAI (Audibert et al., 2010; Karnin et al., 2013; Carpentier and Locatelli, 2016) involves identifying the optimal arm within a fixed number of observations. Within this fixed-budget context, two main metrics are used: the probability of error (PoE) (Audibert et al., 2010; Karnin et al., 2013; Carpentier and Locatelli, 2016)--the likelihood of incorrectly identifying the optimal arm--and the simple regret (Bubeck et al., 2009; Russo, 2016; Komiyama et al., 2023)--the expected performance disparity between the chosen and the optimal arm.


Interpretable classifiers for tabular data via discretization and feature selection

arXiv.org Artificial Intelligence

Explainability and human interpretability are becoming an increasingly important part of research on machine learning. In addition to the immediate benefits of explanations and interpretability in scientific contexts, the capacity to provide explanations behind automated decisions has already been widely addressed also on the level of legislation. For example, the European General Data Protection Regulation [8] and California Consumer Privacy Act [4] both refer to the right of individuals to get explanations of automated decisions concerning them. This article investigates interpretability in the framework of tabular data. Tabular data is highly important for numerous scientific and real-life contexts, often even regarded as the most important form of data: see, e.g., [22, 2]. The aim of the current article is to introduce an efficient method for extracting highly interpretable binary classifiers from tabular data. While explainable AI (or XAI) methods custom-made for pictures and text cannot be readily used in the setting of tabular data [16], numerous succesful XAI methods for tabular data exist. See the survey [20] for an overview of XAI in relation to tabular data. The authors are given in the alphabetical order.


Heart disease risk prediction using deep learning techniques with feature augmentation

arXiv.org Artificial Intelligence

Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for an expert to evaluate each patient taking this information into account. In this manuscript, the authors propose using deep learning methods, combined with feature augmentation techniques for evaluating whether patients are at risk of suffering cardiovascular disease. The results of the proposed methods outperform other state of the art methods by 4.4%, leading to a precision of a 90%, which presents a significant improvement, even more so when it comes to an affliction that affects a large population.


Position Paper: Why the Shooting in the Dark Method Dominates Recommender Systems Practice; A Call to Abandon Anti-Utopian Thinking

arXiv.org Artificial Intelligence

Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.


$\mu$GUIDE: a framework for microstructure imaging via generalized uncertainty-driven inference using deep learning

arXiv.org Artificial Intelligence

This work proposes $\mu$GUIDE: a general Bayesian framework to estimate posterior distributions of tissue microstructure parameters from any given biophysical model or MRI signal representation, with exemplar demonstration in diffusion-weighted MRI. Harnessing a new deep learning architecture for automatic signal feature selection combined with simulation-based inference and efficient sampling of the posterior distributions, $\mu$GUIDE bypasses the high computational and time cost of conventional Bayesian approaches and does not rely on acquisition constraints to define model-specific summary statistics. The obtained posterior distributions allow to highlight degeneracies present in the model definition and quantify the uncertainty and ambiguity of the estimated parameters.


cecilia: A Machine Learning-Based Pipeline for Measuring Metal Abundances of Helium-rich Polluted White Dwarfs

arXiv.org Artificial Intelligence

Over the past several decades, conventional spectral analysis techniques of polluted white dwarfs have become powerful tools to learn about the geology and chemistry of extrasolar bodies. Despite their proven capabilities and extensive legacy of scientific discoveries, these techniques are however still limited by their manual, time-intensive, and iterative nature. As a result, they are susceptible to human errors and are difficult to scale up to population-wide studies of metal pollution. This paper seeks to address this problem by presenting cecilia, the first Machine Learning (ML)-powered spectral modeling code designed to measure the metal abundances of intermediate-temperature (10,000$\leq T_{\rm eff} \leq$20,000 K), Helium-rich polluted white dwarfs. Trained with more than 22,000 randomly drawn atmosphere models and stellar parameters, our pipeline aims to overcome the limitations of classical methods by replacing the generation of synthetic spectra from computationally expensive codes and uniformly spaced model grids, with a fast, automated, and efficient neural-network-based interpolator. More specifically, cecilia combines state-of-the-art atmosphere models, powerful artificial intelligence tools, and robust statistical techniques to rapidly generate synthetic spectra of polluted white dwarfs in high-dimensional space, and enable accurate ($\lesssim$0.1 dex) and simultaneous measurements of 14 stellar parameters -- including 11 elemental abundances -- from real spectroscopic observations. As massively multiplexed astronomical surveys begin scientific operations, cecilia's performance has the potential to unlock large-scale studies of extrasolar geochemistry and propel the field of white dwarf science into the era of Big Data. In doing so, we aspire to uncover new statistical insights that were previously impractical with traditional white dwarf characterisation techniques.