Bayesian Learning
A Bayesian framework for discovering interpretable Lagrangian of dynamical systems from data
Tripura, Tapas, Chakraborty, Souvik
Learning and predicting the dynamics of physical systems requires a profound understanding of the underlying physical laws. Recent works on learning physical laws involve generalizing the equation discovery frameworks to the discovery of Hamiltonian and Lagrangian of physical systems. While the existing methods parameterize the Lagrangian using neural networks, we propose an alternate framework for learning interpretable Lagrangian descriptions of physical systems from limited data using the sparse Bayesian approach. Unlike existing neural network-based approaches, the proposed approach (a) yields an interpretable description of Lagrangian, (b) exploits Bayesian learning to quantify the epistemic uncertainty due to limited data, (c) automates the distillation of Hamiltonian from the learned Lagrangian using Legendre transformation, and (d) provides ordinary (ODE) and partial differential equation (PDE) based descriptions of the observed systems. Six different examples involving both discrete and continuous system illustrates the efficacy of the proposed approach.
Learning Layer-wise Equivariances Automatically using Gradients
van der Ouderaa, Tycho F. A., Immer, Alexander, van der Wilk, Mark
However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.
Quantifying Uncertainty in Deep Learning Classification with Noise in Discrete Inputs for Risk-Based Decision Making
Kheirandish, Maryam, Zhang, Shengfan, Catanzaro, Donald G., Crudu, Valeriu
The use of Deep Neural Network (DNN) models in risk-based decision-making has attracted extensive attention with broad applications in medical, finance, manufacturing, and quality control. To mitigate prediction-related risks in decision making, prediction confidence or uncertainty should be assessed alongside the overall performance of algorithms. Recent studies on Bayesian deep learning helps quantify prediction uncertainty arises from input noises and model parameters. However, the normality assumption of input noise in these models limits their applicability to problems involving categorical and discrete feature variables in tabular datasets. In this paper, we propose a mathematical framework to quantify prediction uncertainty for DNN models. The prediction uncertainty arises from errors in predictors that follow some known finite discrete distribution. We then conducted a case study using the framework to predict treatment outcome for tuberculosis patients during their course of treatment. The results demonstrate under a certain level of risk, we can identify risk-sensitive cases, which are prone to be misclassified due to error in predictors. Comparing to the Monte Carlo dropout method, our proposed framework is more aware of misclassification cases. Our proposed framework for uncertainty quantification in deep learning can support risk-based decision making in applications when discrete errors in predictors are present.
Cost-sensitive probabilistic predictions for support vector machines
Benรญtez-Peรฑa, Sandra, Blanquero, Rafael, Carrizosa, Emilio, Ramรญrez-Cobo, Pepa
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.
Post-hoc Bias Scoring Is Optimal For Fair Classification
Chen, Wenlong, Klochkov, Yegor, Liu, Yang
We consider a binary classification problem under group fairness constraints, which can be one of Demographic Parity (DP), Equalized Opportunity (EOp), or Equalized Odds (EO). We propose an explicit characterization of Bayes optimal classifier under the fairness constraints, which turns out to be a simple modification rule of the unconstrained classifier. Namely, we introduce a novel instancelevel measure of bias, which we call bias score, and the modification rule is a simple linear rule on top of the finite amount of bias scores. Based on this characterization, we develop a post-hoc approach that allows us to adapt to fairness constraints while maintaining high accuracy. In the case of DP and EOp constraints, the modification rule is thresholding a single bias score, while in the case of EO constraints we are required to fit a linear modification rule with 2 parameters. The method can also be applied for composite group-fairness criteria, such as ones involving several sensitive attributes. We achieve competitive or better performance compared to both in-processing and post-processing methods across three datasets: Adult, COMPAS, and CelebA. Unlike most post-processing methods, we do not require access to sensitive attributes during the inference time. Significant improvements have been made in classification tasks using machine learning (ML) algorithms. With ML algorithms being deployed in more and more decision-making applications, it is crucial to ensure fairness in their predictions. Although the debate on what is fairness and how to measure it is ongoing (Caton & Haas, 2023), oftentimes group fairness measures are utilized in practice due to the simplicity of their verification (Chouldechova, 2017; Hardt et al., 2016a), which conform to the intuition that predictions should not be biased toward a specific group of the population.
Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs
Schauer, Moritz, Wienรถbst, Marcel
In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the "Causal Zig-Zag sampler", that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art.
Bayesian Optimisation for Sequential Experimental Design with Applications in Additive Manufacturing
Zhang, Mimi, Parnell, Andrew, Brabazon, Dermot, Benavoli, Alessio
Engineering designs are usually performed under strict budget constraints. Collecting a single datum from computer experiments such as computational fluid dynamics can potentially take weeks or months. Each datum obtained, whether from a simulation or a physical experiment, needs to be maximally informative of the goals we are trying to accomplish. It is thus crucial to decide where and how to collect the necessary data to learn most about the subject of study. Data-driven experimental design appears in many different contexts in chemistry and physics (e.g. Lam et al., 2018) where the design is an iterative process and the outcomes of previous experiments are exploited to make an informed selection of the next design to evaluate. Mathematically, it is often formulated as an optimization problem of a black-box function (that is, the input-output relation is complex and not analytically available). Bayesian optimization (BO) is a well-established technique for blackbox optimization and is primarily used in situations where (1) the objective function is complex and does not have a closed form, (2) no gradient information is available, and (3) function evaluations are expensive (see Frazier, 2018, for a tutorial). BO has been shown to be sample-efficient in many domains (e.g.
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature
Bao, Guangsheng, Zhao, Yanbin, Teng, Zhiyang, Yang, Linyi, Zhang, Yue
Table 4: Details of the source models that is used to produce machine-generated text. We assess the performance of our methodologies using text generations sourced from various models, as outlined in Table 4. These models are arranged in order of their parameter count, with those having fewer than 20 billion parameters being run locally on a Tesla A100 GPU (80G). For models with over 6 billion parameters, we employ half-precision (float16), otherwise, we use full-precision (float32). In the case of larger models like GPT-3, ChatGPT, and GPT-4, we utilize the OpenAI API for the evaluations. Additionally, we provide information about the training corpus associated with each model, which we believe is pertinent for understanding the detection accuracy of different sampling and scoring models when applied to text generations originating from diverse source models, domains, and languages.
Causally Disentangled Generative Variational AutoEncoder
An, Seunghwan, Song, Kyungwoo, Jeon, Jong-June
We present a new supervised learning technique for the Variational AutoEncoder (VAE) that allows it to learn a causally disentangled representation and generate causally disentangled outcomes simultaneously. We call this approach Causally Disentangled Generation (CDG). CDG is a generative model that accurately decodes an output based on a causally disentangled representation. Our research demonstrates that adding supervised regularization to the encoder alone is insufficient for achieving a generative model with CDG, even for a simple task. Therefore, we explore the necessary and sufficient conditions for achieving CDG within a specific model. Additionally, we introduce a universal metric for evaluating the causal disentanglement of a generative model. Empirical results from both image and tabular datasets support our findings.
Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance
Wang, Hongjian, Ramdas, Aaditya
In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.