Bayesian Inference
K-Fold Causal BART for CATE Estimation
Souto, Hugo Gobato, Neto, Francisco Louzada
This research aims to propose and evaluate a novel model named K-Fold Causal Bayesian Additive Regression Trees (K-Fold Causal BART) for improved estimation of Average Treatment Effects (ATE) and Conditional Average Treatment Effects (CATE). The study employs synthetic and semi-synthetic datasets, including the widely recognized Infant Health and Development Program (IHDP) benchmark dataset, to validate the model's performance. Despite promising results in synthetic scenarios, the IHDP dataset reveals that the proposed model is not state-of-the-art for ATE and CATE estimation. Nonetheless, the research provides several novel insights: 1. The ps-BART model is likely the preferred choice for CATE and ATE estimation due to better generalization compared to the other benchmark models - including the Bayesian Causal Forest (BCF) model, which is considered by many the current best model for CATE estimation, 2. The BCF model's performance deteriorates significantly with increasing treatment effect heterogeneity, while the ps-BART model remains robust, 3. Models tend to be overconfident in CATE uncertainty quantification when treatment effect heterogeneity is low, 4. A second K-Fold method is unnecessary for avoiding overfitting in CATE estimation, as it adds computational costs without improving performance, 5. Detailed analysis reveals the importance of understanding dataset characteristics and using nuanced evaluation methods, 6. The conclusion of Curth et al. (2021) that indirect strategies for CATE estimation are superior for the IHDP dataset is contradicted by the results of this research. These findings challenge existing assumptions and suggest directions for future research to enhance causal inference methodologies.
Recursive Nested Filtering for Efficient Amortized Bayesian Experimental Design
Iqbal, Sahel, Abdulsamad, Hany, Pérez-Vieites, Sara, Särkkä, Simo, Corenflos, Adrien
This paper introduces the Inside-Out Nested Particle Filter (IO-NPF), a novel, fully recursive, algorithm for amortized sequential Bayesian experimental design in the non-exchangeable setting. We frame policy optimization as maximum likelihood estimation in a non-Markovian state-space model, achieving (at most) $\mathcal{O}(T^2)$ computational complexity in the number of experiments. We provide theoretical convergence guarantees and introduce a backward sampling algorithm to reduce trajectory degeneracy. IO-NPF offers a practical, extensible, and provably consistent approach to sequential Bayesian experimental design, demonstrating improved efficiency over existing methods.
Intelligent tutoring systems by Bayesian nets with noisy gates
Antonucci, Alessandro, Mangili, Francesca, Bonesana, Claudio, Adorni, Giorgia
Directed graphical models such as Bayesian nets are often used to implement intelligent tutoring systems able to interact in real-time with learners in a purely automatic way. When coping with such models, keeping a bound on the number of parameters might be important for multiple reasons. First, as these models are typically based on expert knowledge, a huge number of parameters to elicit might discourage practitioners from adopting them. Moreover, the number of model parameters affects the complexity of the inferences, while a fast computation of the queries is needed for real-time feedback. We advocate logical gates with uncertainty for a compact parametrization of the conditional probability tables in the underlying Bayesian net used by tutoring systems. We discuss the semantics of the model parameters to elicit and the assumptions required to apply such approach in this domain. We also derive a dedicated inference scheme to speed up computations.
Variational Search Distributions
Steinberg, Daniel M., Oliveira, Rafael, Ong, Cheng Soon, Bonilla, Edwin V.
We develop variational search distributions (VSD), a method for finding discrete, combinatorial designs of a rare desired class in a batch sequential manner with a fixed experimental budget. We formalize the requirements and desiderata for this problem and formulate a solution via variational inference that fulfill these. In particular, VSD uses off-the-shelf gradient based optimization routines, and can take advantage of scalable predictive models. We show that VSD can outperform existing baseline methods on a set of real sequence-design problems in various biological systems. We consider a variant of the active search problem (Garnett et al., 2012; Jiang et al., 2017; Vanchinathan et al., 2015), where we wish to find as many members (designs) of a rare desired class in a batch sequential manner with a fixed experimental budget. Examples of this are compounds that could be useful pharmaceutical drugs, or highly active enzymes for catalysing chemical reactions.
Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone!
Shen, Yuchen, Wen, Haomin, Akoglu, Leman
Outlier detection (OD) has a vast literature as it finds numerous applications in environmental monitoring, cybersecurity, finance, and medicine to name a few. Being an inherently unsupervised task, model selection is a key bottleneck for OD (both algorithm and hyperparameter selection) without label supervision. There is a long list of techniques to choose from -- both classical algorithms and deep neural architectures -- and while several studies report their hyperparameter sensitivity, the literature is quite slim on unsupervised model selection -- limiting the effective use of OD in practice. In this paper we present FoMo-0D, for zero/0-shot OD exploring a transformative new direction that bypasses the hurdle of model selection altogether (!), thus breaking new ground. The fundamental idea behind FoMo-0D is the Prior-data Fitted Networks, recently introduced by Muller et al.(2022), which trains a Transformer model on a large body of synthetically generated data from a prior data distribution. In essence, FoMo-0D is a pretrained Foundation Model for zero/0-shot OD on tabular data, which can directly predict the (outlier/inlier) label of any test data at inference time, by merely a single forward pass -- making obsolete the need for choosing an algorithm/architecture, tuning its associated hyperparameters, and even training any model parameters when given a new OD dataset. Extensive experiments on 57 public benchmark datasets against 26 baseline methods show that FoMo-0D performs statistically no different from the top 2nd baseline, while significantly outperforming the majority of the baselines, with an average inference time of 7.7 ms per test sample.
Explainable Malware Analysis: Concepts, Approaches and Challenges
Manthena, Harikha, Shajarian, Shaghayegh, Kimmell, Jeffrey, Abdelsalam, Mahmoud, Khorsandroo, Sajad, Gupta, Maanak
Machine learning (ML) has seen exponential growth in recent years, finding applications in various domains such as finance, medicine, and cybersecurity. Malware remains a significant threat to modern computing, frequently used by attackers to compromise systems. While numerous machine learning-based approaches for malware detection achieve high performance, they often lack transparency and fail to explain their predictions. This is a critical drawback in malware analysis, where understanding the rationale behind detections is essential for security analysts to verify and disseminate information. Explainable AI (XAI) addresses this issue by maintaining high accuracy while producing models that provide clear, understandable explanations for their decisions. In this survey, we comprehensively review the current state-of-the-art ML-based malware detection techniques and popular XAI approaches. Additionally, we discuss research implementations and the challenges of explainable malware analysis. This theoretical survey serves as an entry point for researchers interested in XAI applications in malware detection. By analyzing recent advancements in explainable malware analysis, we offer a broad overview of the progress in this field, positioning our work as the first to extensively cover XAI methods for malware classification and detection.
Improving Tree Probability Estimation with Stochastic Optimization and Variance Reduction
Xie, Tianyu, Yuan, Musu, Deng, Minghua, Zhang, Cheng
Probability estimation of tree topologies is one of the fundamental tasks in phylogenetic inference. The recently proposed subsplit Bayesian networks (SBNs) provide a powerful probabilistic graphical model for tree topology probability estimation by properly leveraging the hierarchical structure of phylogenetic trees. However, the expectation maximization (EM) method currently used for learning SBN parameters does not scale up to large data sets. In this paper, we introduce several computationally efficient methods for training SBNs and show that variance reduction could be the key for better performance. Furthermore, we also introduce the variance reduction technique to improve the optimization of SBN parameters for variational Bayesian phylogenetic inference (VBPI). Extensive synthetic and real data experiments demonstrate that our methods outperform previous baseline methods on the tasks of tree topology probability estimation as well as Bayesian phylogenetic inference using SBNs.
Empowering Bayesian Neural Networks with Functional Priors through Anchored Ensembling for Mechanics Surrogate Modeling Applications
Ghorbanian, Javad, Casaprima, Nicholas, Olivier, Audrey
In recent years, neural networks (NNs) have become increasingly popular for surrogate modeling tasks in mechanics and materials modeling applications. While traditional NNs are deterministic functions that rely solely on data to learn the input--output mapping, casting NN training within a Bayesian framework allows to quantify uncertainties, in particular epistemic uncertainties that arise from lack of training data, and to integrate a priori knowledge via the Bayesian prior. However, the high dimensionality and non-physicality of the NN parameter space, and the complex relationship between parameters (NN weights) and predicted outputs, renders both prior design and posterior inference challenging. In this work we present a novel BNN training scheme based on anchored ensembling that can integrate a priori information available in the function space, from e.g. low-fidelity models. The anchoring scheme makes use of low-rank correlations between NN parameters, learnt from pre-training to realizations of the functional prior. We also perform a study to demonstrate how correlations between NN weights, which are often neglected in existing BNN implementations, is critical to appropriately transfer knowledge between the function-space and parameter-space priors. Performance of our novel BNN algorithm is first studied on a small 1D example to illustrate the algorithm's behavior in both interpolation and extrapolation settings. Then, a thorough assessment is performed on a multi--input--output materials surrogate modeling example, where we demonstrate the algorithm's capabilities both in terms of accuracy and quality of the uncertainty estimation, for both in-distribution and out-of-distribution data.
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries
Gu, Chunbin, He, Mutian, Cao, Hanqun, Chen, Guangyong, Hsieh, Chang-yu, Heng, Pheng Ann
In the realm of drug discovery, DNA-encoded library (DEL) screening technology has emerged as an efficient method for identifying high-affinity compounds. However, DEL screening faces a significant challenge: noise arising from nonspecific interactions within complex biological systems. Neural networks trained on DEL libraries have been employed to extract compound features, aiming to denoise the data and uncover potential binders to the desired therapeutic target. Nevertheless, the inherent structure of DEL, constrained by the limited diversity of building blocks, impacts the performance of compound encoders. Moreover, existing methods only capture compound features at a single level, further limiting the effectiveness of the denoising strategy. To mitigate these issues, we propose a Multimodal Pretraining DEL-Fusion model (MPDF) that enhances encoder capabilities through pretraining and integrates compound features across various scales. We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions, enhancing the compound encoders' ability to acquire generic features. Furthermore, we propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels, as captured by various compound encoders. The synergy of these innovations equips MPDF with enriched, multi-scale features, enabling comprehensive downstream denoising. Evaluated on three DEL datasets, MPDF demonstrates superior performance in data processing and analysis for validation tasks. Notably, MPDF offers novel insights into identifying high-affinity molecules, paving the way for improved DEL utility in drug discovery.
Amortized Bayesian Workflow (Extended Abstract)
Schmitt, Marvin, Li, Chengkun, Vehtari, Aki, Acerbi, Luigi, Bürkner, Paul-Christian, Radev, Stefan T.
Bayesian inference often faces a trade-off between computational speed and sampling accuracy. We propose an adaptive workflow that integrates rapid amortized inference with gold-standard MCMC techniques to achieve both speed and accuracy when performing inference on many observed datasets. Our approach uses principled diagnostics to guide the choice of inference method for each dataset, moving along the Pareto front from fast amortized sampling to slower but guaranteed-accurate MCMC when necessary. By reusing computations across steps, our workflow creates synergies between amortized and MCMC-based inference. We demonstrate the effectiveness of this integrated approach on a generalized extreme value task with 1000 observed data sets, showing 90x time efficiency gains while maintaining high posterior quality.