Goto

Collaborating Authors

 Bayesian Learning


Rao-Blackwellising Bayesian Causal Inference

arXiv.org Machine Learning

Bayesian causal inference, i.e., inferring a posterior over causal models for the use in downstream causal reasoning tasks, poses a hard computational inference problem that is little explored in literature. In this work, we combine techniques from order-based MCMC structure learning with recent advances in gradient-based graph learning into an effective Bayesian causal inference framework. Specifically, we decompose the problem of inferring the causal structure into (i) inferring a topological order over variables and (ii) inferring the parent sets for each variable. When limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time. We further use Gaussian processes to model the unknown causal mechanisms, which also allows their exact marginalisation. This introduces a Rao-Blackwellization scheme, where all components are eliminated from the model, except for the causal order, for which we learn a distribution via gradient-based optimisation. The combination of Rao-Blackwellization with our sequential inference procedure for causal orders yields state-of-the-art on linear and non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures.


Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

arXiv.org Machine Learning

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.


Learning under Singularity: An Information Criterion improving WBIC and sBIC

arXiv.org Machine Learning

We introduce a novel Information Criterion (IC), termed Learning under Singularity (LS), designed to enhance the functionality of the Widely Applicable Bayes Information Criterion (WBIC) and the Singular Bayesian Information Criterion (sBIC). LS is effective without regularity constraints and demonstrates stability. Watanabe defined a statistical model or a learning machine as regular if the mapping from a parameter to a probability distribution is one-to-one and its Fisher information matrix is positive definite. In contrast, models not meeting these conditions are termed singular. Over the past decade, several information criteria for singular cases have been proposed, including WBIC and sBIC. WBIC is applicable in non-regular scenarios but faces challenges with large sample sizes and redundant estimation of known learning coefficients. Conversely, sBIC is limited in its broader application due to its dependence on maximum likelihood estimates. LS addresses these limitations by enhancing the utility of both WBIC and sBIC. It incorporates the empirical loss from the Widely Applicable Information Criterion (WAIC) to represent the goodness of fit to the statistical model, along with a penalty term similar to that of sBIC. This approach offers a flexible and robust method for model selection, free from regularity constraints.


BlackJAX: Composable Bayesian inference in JAX

arXiv.org Machine Learning

BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well with probabilistic programming languages by working directly with the (un-normalized) target log density function. BlackJAX is intended as a collection of low-level, composable implementations of basic statistical 'atoms' that can be combined to perform well-defined Bayesian inference, but also provides high-level routines for ease of use. It is designed for users who need cutting-edge methods, researchers who want to create complex sampling methods, and people who want to learn how these work.


Effects of term weighting approach with and without stop words removing on Arabic text classification

arXiv.org Artificial Intelligence

Classifying text is a method for categorizing documents into pre-established groups. Text documents must be prepared and represented in a way that is appropriate for the algorithms used for data mining prior to classification. As a result, a number of term weighting strategies have been created in the literature to enhance text categorization algorithms' functionality. This study compares the effects of Binary and Term frequency weighting feature methodologies on the text's classification method when stop words are eliminated once and when they are not. In recognition of assessing the effects of prior weighting of features approaches on classification results in terms of accuracy, recall, precision, and F-measure values, we used an Arabic data set made up of 322 documents divided into six main topics (agriculture, economy, health, politics, science, and sport), each of which contains 50 documents, with the exception of the health category, which contains 61 documents. The results demonstrate that for all metrics, the term frequency feature weighting approach with stop word removal outperforms the binary approach, while for accuracy, recall, and F-Measure, the binary approach outperforms the TF approach without stop word removal. However, for precision, the two approaches produce results that are very similar. Additionally, it is clear from the data that, using the same phrase weighting approach, stop word removing increases classification accuracy.


Generative Probabilistic Time Series Forecasting and Applications in Grid Operations

arXiv.org Artificial Intelligence

The main challenge of applying Wiener-Kallianpur innovation Whereas standard probabilistic forecasting aims to estimate representation for inference and decision-making is the conditional probability distribution of the time series at twofold. First, obtaining a causal encoder to extract the a future time, GPF obtains a generative model capable of innovation process requires knowing the marginal and joint producing arbitrarily many Monte Carlo samples of future distributions of the time series, which is rarely possible without time series realizations according to the conditional probability imposing some parametric structure. Furthermore, even when distribution of the time series given past observations. As the probability distribution is known, there is no known computationally a nonparametric probabilistic forecasting technique, GPF is tractable way to construct the causal encoder to essential for decision-making under uncertainty where datadriven obtain an innovation process. Second, the Wiener-Kallianpur risk-sensitive optimization requires conditional samples innovation representation may not exist for a broad class of of future randomness. The Monte Carlo samples generated random processes, including some of the important cases of from GPF can be used to produce any form of point forecast.


Semirings for Probabilistic and Neuro-Symbolic Logic Programming

arXiv.org Artificial Intelligence

The original framework of Poole and Sato extended the logic programming language Prolog (Flach, 1994) with probabilistic facts. These are facts that are annotated with the probability that they are true; they play a role similar to the parentless nodes in Bayesian networks in that they are marginally independent of one another, and that the probabilistic dependencies are induced by the rules of the logic program. This resulted in the celebrated distribution semantics (Sato, 1995) that is the basis of probabilistic logic programming, and the corresponding learning algorithm in the PRISM language (Sato, 1995) constitutes - to the best of the authors' knowledge - the very first probabilistic programming language with built-in support for machine learning. The work of Sato and Poole has inspired many follow-up works on inference and learning, and has also introduced many variations and extensions of the probabilistic logic programming and its celebrated distribution semantics.


Probabilistic Neural Networks (PNNs) for Modeling Aleatoric Uncertainty in Scientific Machine Learning

arXiv.org Machine Learning

This paper investigates the use of probabilistic neural networks (PNNs) to model aleatoric uncertainty, which refers to the inherent variability in the input-output relationships of a system, often characterized by unequal variance or heteroscedasticity. Unlike traditional neural networks that produce deterministic outputs, PNNs generate probability distributions for the target variable, allowing the determination of both predicted means and intervals in regression scenarios. Contributions of this paper include the development of a probabilistic distance metric to optimize PNN architecture, and the deployment of PNNs in controlled data sets as well as a practical material science case involving fiber-reinforced composites. The findings confirm that PNNs effectively model aleatoric uncertainty, proving to be more appropriate than the commonly employed Gaussian process regression for this purpose. Specifically, in a real-world scientific machine learning context, PNNs yield remarkably accurate output mean estimates with R-squared scores approaching 0.97, and their predicted intervals exhibit a high correlation coefficient of nearly 0.80, closely matching observed data intervals. Hence, this research contributes to the ongoing exploration of leveraging the sophisticated representational capacity of neural networks to delineate complex input-output relationships in scientific problems.


A unified Bayesian framework for interval hypothesis testing in clinical trials

arXiv.org Machine Learning

The American Statistical Association (ASA) statement on statistical significance and P-values \cite{wasserstein2016asa} cautioned statisticians against making scientific decisions solely on the basis of traditional P-values. The statement delineated key issues with P-values, including a lack of transparency, an inability to quantify evidence in support of the null hypothesis, and an inability to measure the size of an effect or the importance of a result. In this article, we demonstrate that the interval null hypothesis framework (instead of the point null hypothesis framework), when used in tandem with Bayes factor-based tests, is instrumental in circumnavigating the key issues of P-values. Further, we note that specifying prior densities for Bayes factors is challenging and has been a reason for criticism of Bayesian hypothesis testing in existing literature. We address this by adapting Bayes factors directly based on common test statistics. We demonstrate, through numerical experiments and real data examples, that the proposed Bayesian interval hypothesis testing procedures can be calibrated to ensure frequentist error control while retaining their inherent interpretability. Finally, we illustrate the improved flexibility and applicability of the proposed methods by providing coherent frameworks for competitive landscape analysis and end-to-end Bayesian hypothesis tests in the context of reporting clinical trial outcomes.


Guarantee Regions for Local Explanations

arXiv.org Artificial Intelligence

Interpretability methods that utilise local surrogate models (e.g. LIME) are very good at describing the behaviour of the predictive model at a point of interest, but they are not guaranteed to extrapolate to the local region surrounding the point. However, overfitting to the local curvature of the predictive model and malicious tampering can significantly limit extrapolation. We propose an anchor-based algorithm for identifying regions in which local explanations are guaranteed to be correct by explicitly describing those intervals along which the input features can be trusted. Our method produces an interpretable feature-aligned box where the prediction of the local surrogate model is guaranteed to match the predictive model. We demonstrate that our algorithm can be used to find explanations with larger guarantee regions that better cover the data manifold compared to existing baselines. We also show how our method can identify misleading local explanations with significantly poorer guarantee regions.