Bayesian Inference
Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks
Designing optimal treatment plans for patients with comorbidities requires accurate cause-specific mortality prognosis. Motivated by the recent availability of linked electronic health records, we develop a nonparametric Bayesian model for survival analysis with competing risks, which can be used for jointly assessing a patient's risk of multiple (competing) adverse outcomes. The model views a patient's survival times with respect to the competing risks as the outputs of a deep multi-task Gaussian process (DMGP), the inputs to which are the patients' covariates. Unlike parametric survival analysis methods based on Cox and Weibull models, our model uses DMGPs to capture complex non-linear interactions between the patients' covariates and cause-specific survival times, thereby learning flexible patient-specific and cause-specific survival curves, all in a data-driven fashion without explicit parametric assumptions on the hazard rates. We propose a variational inference algorithm that is capable of learning the model parameters from time-to-event data while handling right censoring. Experiments on synthetic and real data show that our model outperforms the state-of-the-art survival models.
Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization
Xu, Pan, Ma, Jian, Gu, Quanquan
We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.
Learning Chordal Markov Networks via Branch and Bound
Rantanen, Kari, Hyttinen, Antti, Järvisalo, Matti
We present a new algorithmic approach for the task of finding a chordal Markov network structure that maximizes a given scoring function. The algorithm is based on branch and bound and integrates dynamic programming for both domain pruning and for obtaining strong bounds for search-space pruning. Empirically, we show that the approach dominates in terms of running times a recent integer programming approach (and thereby also a recent constraint optimization approach) for the problem.
Model evidence from nonequilibrium simulations
The marginal likelihood, or model evidence, is a key quantity in Bayesian parameter estimation and model comparison. For many probabilistic models, computation of the marginal likelihood is challenging, because it involves a sum or integral over an enormous parameter space. Markov chain Monte Carlo (MCMC) is a powerful approach to compute marginal likelihoods. Various MCMC algorithms and evidence estimators have been proposed in the literature. Here we discuss the use of nonequilibrium techniques for estimating the marginal likelihood. Nonequilibrium estimators build on recent developments in statistical physics and are known as annealed importance sampling (AIS) and reverse AIS in probabilistic machine learning. We introduce estimators for the model evidence that combine forward and backward simulations and show for various challenging models that the evidence estimators outperform forward and reverse AIS.
Reliable Decision Support using Counterfactual Models
Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and "what if?" reasoning for individualized treatment planning.
Optimal Sample Complexity of M-wise Data for Top-K Ranking
Jang, Minje, Kim, Sunghyun, Suh, Changho, Oh, Sewoong
We explore the top-K rank aggregation problem in which one aims to recover a consistent ordering that focuses on top-K ranked items based on partially revealed preference information. We examine an M-wise comparison model that builds on the Plackett-Luce (PL) model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities. As our result, we characterize the minimax optimality on the sample size for top-K ranking. The optimal sample size turns out to be inversely proportional to M. We devise an algorithm that effectively converts M-wise samples into pairwise ones and employs a spectral method using the refined data. In demonstrating its optimality, we develop a novel technique for deriving tight $\ell_\infty$ estimation error bounds, which is key to accurately analyzing the performance of top-K ranking algorithms, but has been challenging. Recent work relied on an additional maximum-likelihood estimation (MLE) stage merged with a spectral method to attain good estimates in $\ell_\infty$ error to achieve the limit for the pairwise model. In contrast, although it is valid in slightly restricted regimes, our result demonstrates a spectral method alone to be sufficient for the general M-wise model. We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model.
Deep Dynamic Poisson Factorization Model
Gong, Chengyue, huang, win-bin
A new model, named as deep dynamic poisson factorization model, is proposed in this paper for analyzing sequential count vectors. The model based on the Poisson Factor Analysis method captures dependence among time steps by neural networks, representing the implicit distributions. Local complicated relationship is obtained from local implicit distribution, and deep latent structure is exploited to get the long-time dependence. Variational inference on latent variables and gradient descent based on the loss functions derived from variational distribution is performed in our inference. Synthetic datasets and real-world datasets are applied to the proposed model and our results show good predicting and fitting performance with interpretable latent structure.
Predicting User Activity Level In Point Processes With Mass Transport Equation
Wang, Yichen, Ye, Xiaojing, Zha, Hongyuan, Song, Le
Point processes are powerful tools to model user activities and have a plethora of applications in social sciences. Predicting user activities based on point processes is a central problem. However, existing works are mostly problem specific, use heuristics, or simplify the stochastic nature of point processes. In this paper, we propose a framework that provides an efficient estimator of the probability mass function of point processes. In particular, we design a key reformulation of the prediction problem, and further derive a differential-difference equation to compute a conditional probability mass function. Our framework is applicable to general point processes and prediction tasks, and achieves superb predictive and efficiency performance in diverse real-world applications compared to the state of the art.
Flexible statistical inference for mechanistic models of neural dynamics
Lueckmann, Jan-Matthis, Goncalves, Pedro J., Bassetto, Giacomo, Öcal, Kaan, Nonnenmacher, Marcel, Macke, Jakob H.
Mechanistic models of single-neuron dynamics have been extensively studied in computational neuroscience. However, identifying which models can quantitatively reproduce empirically measured data has been challenging. We propose to overcome this limitation by using likelihood-free inference approaches (also known as Approximate Bayesian Computation, ABC) to perform full Bayesian inference on single-neuron models. Our approach builds on recent advances in ABC by learning a neural network which maps features of the observed data to the posterior distribution over parameters. We learn a Bayesian mixture-density network approximating the posterior over multiple rounds of adaptively chosen simulations. Furthermore, we propose an efficient approach for handling missing features and parameter settings for which the simulator fails, as well as a strategy for automatically learning relevant features using recurrent neural networks. On synthetic data, our approach efficiently estimates posterior distributions and recovers ground-truth parameters. On in-vitro recordings of membrane voltages, we recover multivariate posteriors over biophysical parameters, which yield model-predicted voltage traces that accurately match empirical data. Our approach will enable neuroscientists to perform Bayesian inference on complex neuron models without having to design model-specific algorithms, closing the gap between mechanistic and statistical approaches to single-neuron modelling.
GP CaKe: Effective brain connectivity with causal kernels
Ambrogioni, Luca, Hinne, Max, Gerven, Marcel Van, Maris, Eric
A fundamental goal in network neuroscience is to understand how activity in one brain region drives activity elsewhere, a process referred to as effective connectivity. Here we propose to model this causal interaction using integro-differential equations and causal kernels that allow for a rich analysis of effective connectivity. The approach combines the tractability and flexibility of autoregressive modeling with the biophysical interpretability of dynamic causal modeling. The causal kernels are learned nonparametrically using Gaussian process regression, yielding an efficient framework for causal inference. We construct a novel class of causal covariance functions that enforce the desired properties of the causal kernels, an approach which we call GP CaKe. By construction, the model and its hyperparameters have biophysical meaning and are therefore easily interpretable. We demonstrate the efficacy of GP CaKe on a number of simulations and give an example of a realistic application on magnetoencephalography (MEG) data.