Goto

Collaborating Authors

 Bayesian Inference


Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs

arXiv.org Artificial Intelligence

Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled, probabilistic framing based on the outcome space resulting from featured-derived partial orders on events. From this, we derive an EKG discovery algorithm based upon statistical inference rather than an ad-hoc or heuristic-based strategy, or relying on manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the maximum likelihood term involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We show that the posterior probability as defined is antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.


Stabilized Neural Prediction of Potential Outcomes in Continuous Time

arXiv.org Artificial Intelligence

Patient trajectories from electronic health records are widely used to predict potential outcomes of treatments over time, which then allows to personalize care. Yet, existing neural methods for this purpose have a key limitation: while some adjust for time-varying confounding, these methods assume that the time series are recorded in discrete time. In other words, they are constrained to settings where measurements and treatments are conducted at fixed time steps, even though this is unrealistic in medical practice. In this work, we aim to predict potential outcomes in continuous time. The latter is of direct practical relevance because it allows for modeling patient trajectories where measurements and treatments take place at arbitrary, irregular timestamps. We thus propose a new method called stabilized continuous time inverse propensity network (SCIP-Net). For this, we further derive stabilized inverse propensity weights for robust prediction of the potential outcomes. To the best of our knowledge, our SCIP-Net is the first neural method that performs proper adjustments for time-varying confounding in continuous time.


Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors

arXiv.org Machine Learning

Obtaining a reliable estimate of the joint probability mass function (PMF) of a set of random variables from observed data is a significant objective in statistical signal processing and machine learning. Modelling the joint PMF as a tensor that admits a low-rank canonical polyadic decomposition (CPD) has enabled the development of efficient PMF estimation algorithms. However, these algorithms require the rank (model order) of the tensor to be specified beforehand. In real-world applications, the true rank is unknown. Therefore, an appropriate rank is usually selected from a candidate set either by observing validation errors or by computing various likelihood-based information criteria, a procedure which is computationally expensive for large datasets. This paper presents a novel Bayesian framework for estimating the joint PMF and automatically inferring its rank from observed data. We specify a Bayesian PMF estimation model and employ appropriate prior distributions for the model parameters, allowing for tuning-free rank inference via a single training run. We then derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging stochastic variational inference (SVI) to arrive at an efficient algorithm whose complexity scales sublinearly with the size of the dataset. Numerical experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.


Temperature Optimization for Bayesian Deep Learning

arXiv.org Machine Learning

The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE' suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remains no systematic method for finding the optimal temperature beyond grid search. In this work, we propose a data-driven approach to select the temperature that maximizes test log-predictive density, treating the temperature as a model parameter and estimating it directly from the data. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost, across both regression and classification tasks. Finally, we highlight the differing perspectives on CPE between the BDL and Generalized Bayes communities: while the former primarily focuses on predictive performance of the PPD, the latter emphasizes calibrated uncertainty and robustness to model misspecification; these distinct objectives lead to different temperature preferences.


Reviews: Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes

Neural Information Processing Systems

The authors propose a method of estimating treatment effectiveness T(x) from a vector of patient features x. Treatment effectiveness is defined as (health outcome with treatment Yw) - (health outcome without treatment Y(1-w)). Presumably a health outcome might be something like survival time. If a patient survives 27 months with the treatment and only 9 without then the effectiveness T(x) would be 18 months? The authors estimate models of "outcome with treatment" and "outcome without treatment" jointly using RKHS kernel approximations on the whole dataset (I think there is a shared kernel). For a specific patient the effectiveness is based on the actual outcome of the patient which will be based on their features and their treatment condition minus the population model for the features of the opposite or counterfactual treatment condition.


Reviews: Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with \beta -Divergences

Neural Information Processing Systems

Overview The paper introduces a robust online change point detection algorithm for non-stationary time-series data. Robustness comes as a by product of minimizing \beta-divergence between data and fitted model as opposed to using KL divergence as in standard Bayesian inference. In the generalized Bayesian inference the posteriors are intractable. The paper mitigate this problem by resorting to structural variational approximation, which is proved to be exact as \beta converges to zero. The paper also discusses systematic approaches to initialize \beta and refine it online.


Reviews: Predictive Approximate Bayesian Computation via Saddle Points

Neural Information Processing Systems

I am happy with all of your responses, though slightly confused over Q2 (rev2). One can't draw samples from improper priors in the first place, and other techniques (such as Rodrigues et al) won't save you there. You simply need to draw your samples from a distribution that is not the prior. I am still positively inclined towards this paper, and following the response and comparison to EP-ABC I will increase my score to 7 (from 6). Of course when the prior is improper or merely diffuse with respect to the posterior this will be impossible or at best highly inefficient.


Reviews: Constructing Deep Neural Networks by Bayesian Network Structure Learning

Neural Information Processing Systems

The presented method learns a structure of a deep ANN by first learning a BN and then constructing the ANN from this BN. The authors state that they "propose a new interpretation for depth and inter-layer connectivity in deep neural networks". Neurons in deep layers represent low-order conditional independencies (ie small conditioning set) and those in'early' (non-deep) layers represent high-order CI relationships. These are all CI relations in the "X" ie the input vector of (observed) random variables. Perhaps I am missing something here but I could not find an argument as to why this is a principled way to build deep ANNs with good performance.


Reviews: A Bayesian Approach to Generative Adversarial Imitation Learning

Neural Information Processing Systems

It seems that this could perhaps be expressed more concisely using the output of the discriminator (and the true label) as functions, rather than introducing new random variables. Further, it seems the algorithm is described in sufficient detail to be re-implemented. The experiments are missing some detail to be reproduced or interpreted (e.g.


Reviews: Robust Conditional Probabilities

Neural Information Processing Systems

This paper studies the problem of computing probability bounds, more specifically bounds over probability of atoms of the joint space and conditional probabilities of the class, under the assumption that only some pairwise marginal as well as some univariate marginal values are known. The idea is that such marginals may be easier to obtain than fully specified probabilities, and that cautious inferences can then be used to produce predictions. It is shown that when the marginals follow a tree structure (results are extended to a few other structures), then the problem can actually be solved in closed, analytical form, relating it to cover set and maximum flow problems. Some experiments performed on neural networks show that this simple method is actually competitive with other more complex approaches (Ladder, VAE), while outperforming methods of comparable complexity. The paper is elegantly written, with quite understandable and significant results.