Goto

Collaborating Authors

 Uncertainty


Bayesian Incremental Learning for Deep Neural Networks

arXiv.org Machine Learning

In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as compared to the full dataset. Our work focuses on a continuous learning setup where the task is always the same and new parts of data arrive sequentially. We apply a Bayesian approach to update the posterior approximation with each new piece of data and find this method to outperform the traditional approach in our experiments.


Operator Variational Inference

arXiv.org Machine Learning

Variational inference is an umbrella term for algorithms which cast Bayesian inference as optimization. Classically, variational inference uses the Kullback-Leibler divergence to define the optimization. Though this divergence has been widely used, the resultant posterior approximation can suffer from undesirable statistical properties. To address this, we reexamine variational inference from its roots as an optimization problem. We use operators, or functions of functions, to design variational objectives. As one example, we design a variational objective with a Langevin-Stein operator. We develop a black box algorithm, operator variational inference (OPVI), for optimizing any operator objective. Importantly, operators enable us to make explicit the statistical and computational tradeoffs for variational inference. We can characterize different properties of variational objectives, such as objectives that admit data subsampling---allowing inference to scale to massive data---as well as objectives that admit variational programs---a rich class of posterior approximations that does not require a tractable density. We illustrate the benefits of OPVI on a mixture model and a generative model of images.


A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome

arXiv.org Machine Learning

In this work, we consider the problem of predicting the course of a progressive disease, such as cancer or Alzheimer's. Progressive diseases often start with mild symptoms that might precede a diagnosis, and each patient follows their own trajectory. Patient trajectories exhibit wild variability, which can be associated with many factors such as genotype, age, or sex. An additional layer of complexity is that, in real life, the amount and type of data available for each patient can differ significantly. For example, for one patient we might have no prior history, whereas for another patient we might have detailed clinical assessments obtained at multiple prior time-points. This paper presents a probabilistic model that can handle multiple modalities (including images and clinical assessments) and variable patient histories with irregular timings and missing entries, to predict clinical scores at future time-points. We use a sigmoidal function to model latent disease progression, which gives rise to clinical observations in our generative model. We implemented an approximate Bayesian inference strategy on the proposed model to estimate the parameters on data from a large population of subjects. Furthermore, the Bayesian framework enables the model to automatically fine-tune its predictions based on historical observations that might be available on the test subject. We applied our method to a longitudinal Alzheimer's disease dataset with more than 3000 subjects [23] and present a detailed empirical analysis of prediction performance under different scenarios, with comparisons against several benchmarks. We also demonstrate how the proposed model can be interrogated to glean insights about temporal dynamics in Alzheimer's disease.


SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning

arXiv.org Machine Learning

We present the Structural Agnostic Model (SAM), a framework to estimate end-to-end non-acyclic causal graphs from observational data. In a nutshell, SAM implements an adversarial game in which a separate model generates each variable, given real values from all others. In tandem, a discriminator attempts to distinguish between the joint distributions of real and generated samples. Finally, a sparsity penalty forces each generator to consider only a small subset of the variables, yielding a sparse causal graph. SAM scales easily to hundreds variables. Our experiments show the state-of-the-art performance of SAM on discovering causal structures and modeling interventions, in both acyclic and non-acyclic graphs.


Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays

arXiv.org Machine Learning

We propose a simulation method for multidimensional Hawkes processes based on superposition theory of point processes. This formulation allows us to design efficient simulations for Hawkes processes with differing exponentially decaying intensities. We demonstrate that inter-arrival times can be decomposed into simpler auxiliary variables that can be sampled directly, giving exact simulation with no approximation. We establish that the auxiliary variables provides information on the parent process for each event time. The algorithm correctness is shown by verifying the simulated intensities with their theoretical moments. A modular inference procedure consisting of Gibbs samplers through the auxiliary variable augmentation and adaptive rejection sampling is presented. Finally, we compare our proposed simulation method against existing methods, and find significant improvement in terms of algorithm speed. Our inference algorithm is used to discover the strengths of mutually excitations in real dark networks. Keywords: Hawkes process, marked point process, exact simulation, Bayesian inference.


Large Scale Automated Forecasting for Monitoring Network Safety and Security

arXiv.org Machine Learning

As outlined in [1], leveraging big data and real time analytics constitute two main research venues for OR/MS in the analytics age. Information and communication technologies (ICT) have experienced an exponential growth in the last few decades and most human activities, businesses and devices strongly depend on ICT [2]. With the advent of the Internet of Things (IoT), this interrelation will become even more evident and change dramatically the way in which different components of business and service systems interact. In parallel, risks concerning the security of ICT systems are also growing, as pointed out e.g. in [3].


Black-box Variational Inference for Stochastic Differential Equations

arXiv.org Machine Learning

Parameter inference for stochastic differential equations is challenging due to the presence of a latent diffusion process. Working with an Euler-Maruyama discretisation for the diffusion, we use variational inference to jointly learn the parameters and the diffusion paths. We use a standard mean-field variational approximation of the parameter posterior, and introduce a recurrent neural network to approximate the posterior for the diffusion paths conditional on the parameters. This neural network learns how to provide Gaussian state transitions which bridge between observations in a very similar way to the conditioned diffusion process. The resulting black-box inference method can be applied to any SDE system with light tuning requirements. We illustrate the method on a Lotka-Volterra system and an epidemic model, producing accurate parameter estimates in a few hours.


Estimating activity cycles with probabilistic methods I. Bayesian Generalised Lomb-Scargle Periodogram with Trend

arXiv.org Machine Learning

Period estimation is one of the central topics in astronomical time series analysis, where data is often unevenly sampled. Especially challenging are studies of stellar magnetic cycles, as there the periods looked for are of the order of the same length than the datasets themselves. The datasets often contain trends, the origin of which is either a real long-term cycle or an instrumental effect, but these effects cannot be reliably separated, while they can lead to erroneous period determinations if not properly handled. In this study we aim at developing a method that can handle the trends properly, and by performing extensive set of testing, we show that this is the optimal procedure when contrasted with methods that do not include the trend directly to the model. The effect of the form of the noise (whether constant or heteroscedastic) on the results is also investigated. We introduce a Bayesian Generalised Lomb-Scargle Periodogram with Trend (BGLST), which is a probabilistic linear regression model using Gaussian priors for the coefficients and uniform prior for the frequency parameter. We show, using synthetic data, that when there is no prior information on whether and to what extent the true model of the data contains a linear trend, the introduced BGLST method is preferable to the methods which either detrend the data or leave the data untrended before fitting the periodic model. Whether to use noise with different than constant variance in the model depends on the density of the data sampling as well as on the true noise type of the process.


Efficient and principled score estimation with Nystr\"om kernel exponential families

arXiv.org Machine Learning

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. Our approach improves the computational efficiency of an earlier solution by using a low-rank, Nystr\"om-like solution. The new solution retains the consistency and convergence rates of the full-rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. We evaluate the method in experiments on density estimation and in the construction of an adaptive Hamiltonian Monte Carlo sampler. Compared to an existing score learning approach using a denoising autoencoder, our estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees.


Causal inference for cloud computing

arXiv.org Artificial Intelligence

Cloud computing involves complex technical and economical systems and interactions. This brings about various challenges, two of which are: (1) debugging and control of computing systems with the help of sandbox experiments, and (2) prediction of the cost of "spot" resources for decision making of cloud clients. In this paper, we formalize debugging by counterfactual probabilities and control by post-(soft-)interventional probabilities. We prove that counterfactuals can approximately be calculated from a "stochastic" graphical causal model (while they are originally defined only for "deterministic" functional causal models), and based on this sketch an approach to address problem (1). To address problem (2), we formalize bidding by post-(soft-)interventional probabilities and present a simple mathematical result on approximate integration of "incomplete" conditional probability distributions. We show how this can be used by cloud clients to trade off privacy against predictability of the outcome of their bidding actions in a toy scenario. We report experiments on simulated and real data.