Learning Graphical Models
Deep Kalman Filters
Krishnan, Rahul G., Shalit, Uri, Sontag, David
Kalman Filters are one of the most influential models of time-varying phenomena. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption in a variety of disciplines. Motivated by recent variational methods for learning deep generative models, we introduce a unified algorithm to efficiently learn a broad spectrum of Kalman filters. Of particular interest is the use of temporal generative models for counterfactual inference. We investigate the efficacy of such models for counterfactual inference, and to that end we introduce the "Healing MNIST" dataset where long-term structure, noise and actions are applied to sequences of digits. We show the efficacy of our method for modeling this dataset. We further show how our model can be used for counterfactual inference for patients, based on electronic health record data of 8,000 patients over 4.5 years.
Causal inference using invariant prediction: identification and confidence intervals
Peters, Jonas, Bühlmann, Peter, Meinshausen, Nicolai
What is the difference of a prediction that is made with a causal model and a non-causal model? Suppose we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (for example various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.
Maximum Likelihood Estimation for Single Linkage Hierarchical Clustering
Zhu, Dekang, Guralnik, Dan P., Wang, Xuezhi, Li, Xiang, Moran, Bill
Clustering is a common technique for statistical data analysis, widely used in data mining, machine learning, pattern recognition, image analysis, bioinformatics and cyber security. Conventional ("flat", "hard") clustering methods accept a finite metric space (O, d) as input and return a partition of O as their output. Hierarchical clustering (HC) methods have a different philosophy: their output is an entire hierarchy of partitions, called a dendrogram, capable of exhibiting multi-scale structure in the data set [1, 2]. Rather than fixing the required number of clusters in advance, as is common for many flat clustering algorithms, it is more informative to furnish a hierarchy of clusters, providing an opportunity to choose a partition at a scale most natural for the context of the task at hand. Many HC methods require linkage functions to provide a measure of dissimilarity between clusters (see [3, 4] for a fairly recent review). Some commonly used linkage functions are single linkage, complete linkage, average linkage, etc. The SLHC method, though suffering from the so called "chaining effect", remains popular for large scale applications [5] because of the low complexity of implementing it using minimum spanning trees (MST) [6].
Natural Language Understanding with Distributed Representation
As the name of the course suggests, this lecture note introduces readers to a neural network based approach to natural language understanding/processing. In order to make it as self-contained as possible, I spend much time on describing basics of machine learning and neural networks, only after which how they are used for natural languages is introduced. On the language front, I almost solely focus on language modelling and machine translation, two of which I personally find most fascinating and most fundamental to natural language understanding. After about a month of lectures and about 40 pages of writing this lecture note, I found this fascinating note [47] by Yoav Goldberg on neural network models for natural language processing. This note deals with wider topics on natural language processing with distributed representations in more details, and I highly recommend you to read it (hopefully along with this lecture note.)
Private Posterior distributions from Variational approximations
Karwa, Vishesh, Kifer, Dan, Slavković, Aleksandra B.
Privacy preserving mechanisms such as differential privacy inject additional randomness in the form of noise in the data, beyond the sampling mechanism. Ignoring this additional noise can lead to inaccurate and invalid inferences. In this paper, we incorporate the privacy mechanism explicitly into the likelihood function by treating the original data as missing, with an end goal of estimating posterior distributions over model parameters. This leads to a principled way of performing valid statistical inference using private data, however, the corresponding likelihoods are intractable. In this paper, we derive fast and accurate variational approximations to tackle such intractable likelihoods that arise due to privacy. We focus on estimating posterior distributions of parameters of the naive Bayes log-linear model, where the sufficient statistics of this model are shared using a differentially private interface. Using a simulation study, we show that the posterior approximations outperform the naive method of ignoring the noise addition mechanism.
Searching for Objects using Structure in Indoor Scenes
Nagaraja, Varun K., Morariu, Vlad I., Davis, Larry S.
To identify the location of objects of a particular class, a passive computer vision system generally processes all the regions in an image to finally output few regions. However, we can use structure in the scene to search for objects without processing the entire image. We propose a search technique that sequentially processes image regions such that the regions that are more likely to correspond to the query class object are explored earlier. We frame the problem as a Markov decision process and use an imitation learning algorithm to learn a search strategy. Since structure in the scene is essential for search, we work with indoor scene images as they contain both unary scene context information and object-object context in the scene. We perform experiments on the NYU-depth v2 dataset and show that the unary scene context features alone can achieve a significantly high average precision while processing only 20-25\% of the regions for classes like bed and sofa. By considering object-object context along with the scene context features, the performance is further improved for classes like counter, lamp, pillow and sofa.
Stick-Breaking Policy Learning in Dec-POMDPs
Liu, Miao, Amato, Christopher, Liao, Xuejun, Carin, Lawrence, How, Jonathan P.
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Functional Gaussian Process Model for Bayesian Nonparametric Analysis
Duan, Leo L., Wang, Xia, Szczesniak, Rhonda D.
Gaussian process is a theoretically appealing model for nonparametric analysis, but its computational cumbersomeness hinders its use in large scale and the existing reduced-rank solutions are usually heuristic. In this work, we propose a novel construction of Gaussian process as a projection from fixed discrete frequencies to any continuous location. This leads to a valid stochastic process that has a theoretic support with the reduced rank in the spectral density, as well as a high-speed computing algorithm. Our method provides accurate estimates for the covariance parameters and concise form of predictive distribution for spatial prediction. For non-stationary data, we adopt the mixture framework with a customized spectral dependency structure. This enables clustering based on local stationarity, while maintains the joint Gaussianness. Our work is directly applicable in solving some of the challenges in the spatial data, such as large scale computation, anisotropic covariance, spatio-temporal modeling, etc. We illustrate the uses of the model via simulations and an application on a massive dataset.
Black box variational inference for state space models
Archer, Evan, Park, Il Memming, Buesing, Lars, Cunningham, John, Paninski, Liam
Latent variable time-series models are among the most heavily used tools from machine learning and applied statistics. These models have the advantage of learning latent structure both from noisy observations and from the temporal ordering in the data, where it is assumed that meaningful correlation structure exists across time. A few highly-structured models, such as the linear dynamical system with linear-Gaussian observations, have closed-form inference procedures (e.g. the Kalman Filter), but this case is an exception to the general rule that exact posterior inference in more complex generative models is intractable. Consequently, much work in time-series modeling focuses on approximate inference procedures for one particular class of models. Here, we extend recent developments in stochastic variational inference to develop a `black-box' approximate inference technique for latent variable models with latent dynamical structure. We propose a structured Gaussian variational approximate posterior that carries the same intuition as the standard Kalman filter-smoother but, importantly, permits us to use the same inference approach to approximate the posterior of much more general, nonlinear latent variable generative models. We show that our approach recovers accurate estimates in the case of basic models with closed-form posteriors, and more interestingly performs well in comparison to variational approaches that were designed in a bespoke fashion for specific non-conjugate models.
Bayesian Evidence and Model Selection
Knuth, Kevin H., Habeck, Michael, Malakar, Nabin K., Mubeen, Asim M., Placek, Ben
In this paper we review the concepts of Bayesian evidence and Bayes factors, also known as log odds ratios, and their application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Specific attention is paid to the Laplace approximation, variational Bayes, importance sampling, thermodynamic integration, and nested sampling and its recent variants. Analogies to statistical physics, from which many of these techniques originate, are discussed in order to provide readers with deeper insights that may lead to new techniques. The utility of Bayesian model testing in the domain sciences is demonstrated by presenting four specific practical examples considered within the context of signal processing in the areas of signal detection, sensor characterization, scientific model selection and molecular force characterization.