Goto

Collaborating Authors

 Bayesian Learning


FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference

arXiv.org Machine Learning

Prediction-powered inference (PPI) enables valid statistical inference by combining experimental data with machine learning predictions. When a sufficient number of high-quality predictions is available, PPI results in more accurate estimates and tighter confidence intervals than traditional methods. In this paper, we propose to inform the PPI framework with prior knowledge on the quality of the predictions. The resulting method, which we call frequentist, assisted by Bayes, PPI (FAB-PPI), improves over PPI when the observed prediction quality is likely under the prior, while maintaining its frequentist guarantees. Furthermore, when using heavy-tailed priors, FAB-PPI adaptively reverts to standard PPI in low prior probability regions. We demonstrate the benefits of FAB-PPI in real and synthetic examples.


Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective

arXiv.org Artificial Intelligence

Self-Explainable Graph Neural Networks (SE-GNNs) are popular explainable-by-design GNNs, but the properties and the limitations of their explanations are not well understood. Our first contribution fills this gap by formalizing the explanations extracted by SE-GNNs, referred to as Trivial Explanations (TEs), and comparing them to established notions of explanations, namely Prime Implicant (PI) and faithful explanations. Our analysis reveals that TEs match PI explanations for a restricted but significant family of tasks. In general, however, they can be less informative than PI explanations and are surprisingly misaligned with widely accepted notions of faithfulness. Although faithful and PI explanations are informative, they are intractable to find and we show that they can be prohibitively large. Motivated by this, we propose Dual-Channel GNNs that integrate a white-box rule extractor and a standard SE-GNN, adaptively combining both channels when the task benefits. Our experiments show that even a simple instantiation of Dual-Channel GNNs can recover succinct rules and perform on par or better than widely used SE-GNNs. Our code can be found in the supplementary material.


Practically Effective Adjustment Variable Selection in Causal Inference

arXiv.org Artificial Intelligence

In the estimation of causal effects, one common method for removing the influence of confounders is to adjust the variables that satisfy the back-door criterion. However, it is not always possible to uniquely determine sets of such variables. Moreover, real-world data is almost always limited, which means it may be insufficient for statistical estimation. Therefore, we propose criteria for selecting variables from a list of candidate adjustment variables along with an algorithm to prevent accuracy degradation in causal effect estimation. We initially focus on directed acyclic graphs (DAGs) and then outlines specific steps for applying this method to completed partially directed acyclic graphs (CPDAGs). We also present and prove a theorem on causal effect computation possibility in CPDAGs. Finally, we demonstrate the practical utility of our method using both existing and artificial data.


Learning Hyperparameters via a Data-Emphasized Variational Objective

arXiv.org Machine Learning

When training large flexible models, practitioners often rely on grid search to select hyperparameters that control over-fitting. This grid search has several disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the available data for training, and requires users to specify candidate values. In this paper, we propose an alternative: directly learning regularization hyperparameters on the full training set via the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior. Our proposed technique overcomes all three disadvantages of grid search. In a case study on transfer learning of image classifiers, we show how our method reduces the 88+ hour grid search of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable length-scale kernels.


Federated Learning with Discriminative Naive Bayes Classifier

arXiv.org Artificial Intelligence

Federated Learning has emerged as a promising approach to train machine learning models on decentralized data sources while preserving data privacy. This paper proposes a new federated approach for Naive Bayes (NB) classification, assuming discrete variables. Our approach federates a discriminative variant of NB, sharing meaningless parameters instead of conditional probability tables. Therefore, this process is more reliable against possible attacks. We conduct extensive experiments on 12 datasets to validate the efficacy of our approach, comparing federated and non-federated settings. Additionally, we benchmark our method against the generative variant of NB, which serves as a baseline for comparison. Our experimental results demonstrate the effectiveness of our method in achieving accurate classification.


FedGES: A Federated Learning Approach for BN Structure Learning

arXiv.org Artificial Intelligence

Bayesian Network (BN) structure learning traditionally centralizes data, raising privacy concerns when data is distributed across multiple entities. This research introduces Federated GES (FedGES), a novel Federated Learning approach tailored for BN structure learning in decentralized settings using the Greedy Equivalence Search (GES) algorithm. FedGES uniquely addresses privacy and security challenges by exchanging only evolving network structures, not parameters or data. It realizes collaborative model development, using structural fusion to combine the limited models generated by each client in successive iterations. A controlled structural fusion is also proposed to enhance client consensus when adding any edge.


Estimating Network Models using Neural Networks

arXiv.org Machine Learning

Exponential random graph models (ERGMs) are very flexible for modeling network formation but pose difficult estimation challenges due to their intractable normalizing constant. Existing methods, such as MCMC-MLE, rely on sequential simulation at every optimization step. We propose a neural network approach that trains on a single, large set of parameter-simulation pairs to learn the mapping from parameters to average network statistics. Once trained, this map can be inverted, yielding a fast and parallelizable estimation method. The procedure also accommodates extra network statistics to mitigate model misspecification. Some simple illustrative examples show that the method performs well in practice.


What is causal about causal models and representations?

arXiv.org Machine Learning

Causal Bayesian networks are 'causal' models since they make predictions about interventional distributions. To connect such causal model predictions to real-world outcomes, we must determine which actions in the world correspond to which interventions in the model. For example, to interpret an action as an intervention on a treatment variable, the action will presumably have to a) change the distribution of treatment in a way that corresponds to the intervention, and b) not change other aspects, such as how the outcome depends on the treatment; while the marginal distributions of some variables may change as an effect. We introduce a formal framework to make such requirements for different interpretations of actions as interventions precise. We prove that the seemingly natural interpretation of actions as interventions is circular: Under this interpretation, every causal Bayesian network that correctly models the observational distribution is trivially also interventionally valid, and no action yields empirical data that could possibly falsify such a model. We prove an impossibility result: No interpretation exists that is non-circular and simultaneously satisfies a set of natural desiderata. Instead, we examine non-circular interpretations that may violate some desiderata and show how this may in turn enable the falsification of causal models. By rigorously examining how a causal Bayesian network could be a 'causal' model of the world instead of merely a mathematical object, our formal framework contributes to the conceptual foundations of causal representation learning, causal discovery, and causal abstraction, while also highlighting some limitations of existing approaches.


Efficient Prior Selection in Gaussian Process Bandits with Thompson Sampling

arXiv.org Machine Learning

Gaussian process (GP) bandits provide a powerful framework for solving blackbox optimization of unknown functions. The characteristics of the unknown function depends heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) and HyperPrior GP-TS (HP-GP-TS). We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the effectiveness of our algorithms compared to the alternatives through experiments with synthetic and real-world data.


Enhancing Bayesian Network Structural Learning with Monte Carlo Tree Search

arXiv.org Artificial Intelligence

This article presents MCTS-BN, an adaptation of the Monte Carlo Tree Search (MCTS) algorithm for the structural learning of Bayesian Networks (BNs). Initially designed for game tree exploration, MCTS has been repurposed to address the challenge of learning BN structures by exploring the search space of potential ancestral orders in Bayesian Networks. Then, it employs Hill Climbing (HC) to derive a Bayesian Network structure from each order. In large BNs, where the search space for variable orders becomes vast, using completely random orders during the rollout phase is often unreliable and impractical. We adopt a semi-randomized approach to address this challenge by incorporating variable orders obtained from other heuristic search algorithms such as Greedy Equivalent Search (GES), PC, or HC itself. This hybrid strategy mitigates the computational burden and enhances the reliability of the rollout process. Experimental evaluations demonstrate the effectiveness of MCTS-BN in improving BNs generated by traditional structural learning algorithms, exhibiting robust performance even when base algorithm orders are suboptimal and surpassing the gold standard when provided with favorable orders.