Undirected Networks
Probabilistic Inference with Generating Functions for Poisson Latent Variable Models
Winner, Kevin, Sheldon, Daniel R.
Graphical models with latent count variables arise in a number of fields. Standard exact inference techniques such as variable elimination and belief propagation do not apply to these models because the latent variables have countably infinite support. As a result, approximations such as truncation or MCMC are employed. We present the first exact inference algorithms for a class of models with latent count variables by developing a novel representation of countably infinite factors as probability generating functions, and then performing variable elimination with generating functions. Our approach is exact, runs in pseudo-polynomial time, and is much faster than existing approximate techniques. It leads to better parameter estimates for problems in population ecology by avoiding error introduced by approximate likelihood computations.
Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo
Durmus, Alain, Simsekli, Umut, Moulines, Eric, Badeau, Roland, RICHARD, Gaël
Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) algorithms have become increasingly popular for Bayesian inference in large-scale applications. Even though these methods have proved useful in several scenarios, their performance is often limited by their bias. In this study, we propose a novel sampling algorithm that aims to reduce the bias of SG-MCMC while keeping the variance at a reasonable level. Our approach is based on a numerical sequence acceleration method, namely the Richardson-Romberg extrapolation, which simply boils down to running almost the same SG-MCMC algorithm twice in parallel with different step sizes. We illustrate our framework on the popular Stochastic Gradient Langevin Dynamics (SGLD) algorithm and propose a novel SG-MCMC algorithm referred to as Stochastic Gradient Richardson-Romberg Langevin Dynamics (SGRRLD). We provide formal theoretical analysis and show that SGRRLD is asymptotically consistent, satisfies a central limit theorem, and its non-asymptotic bias and the mean squared-error can be bounded. Our results show that SGRRLD attains higher rates of convergence than SGLD in both finite-time and asymptotically, and it achieves the theoretical accuracy of the methods that are based on higher-order integrators. We support our findings using both synthetic and real data experiments.
PAC Reinforcement Learning with Rich Observations
Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John
We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.
A Credit Assignment Compiler for Joint Prediction
Chang, Kai-Wei, He, He, Ross, Stephane, III, Hal Daume, Langford, John
Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.
Optimal Tagging with Markov Chain Optimization
Rosenfeld, Nir, Globerson, Amir
Many information systems use tags and keywords to describe and annotate content. These allow for efficient organization and categorization of items, as well as facilitate relevant search queries. As such, the selected set of tags for an item can have a considerable effect on the volume of traffic that eventually reaches an item. In tagging systems where tags are exclusively chosen by an item's owner, who in turn is interested in maximizing traffic, a principled approach for assigning tags can prove valuable. In this paper we introduce the problem of optimal tagging, where the task is to choose a subset of tags for a new item such that the probability of browsing users reaching that item is maximized. We formulate the problem by modeling traffic using a Markov chain, and asking how transitions in this chain should be modified to maximize traffic into a certain state of interest. The resulting optimization problem involves maximizing a certain function over subsets, under a cardinality constraint. We show that the optimization problem is NP-hard, but has a (1-1/e)-approximation via a simple greedy algorithm due to monotonicity and submodularity. Furthermore, the structure of the problem allows for an efficient computation of the greedy step. To demonstrate the effectiveness of our method, we perform experiments on three tagging datasets, and show that the greedy algorithm outperforms other baselines.
Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis
A spectral analysis of the Koopman operator, which is an infinite dimensional linear operator on an observable, gives a (modal) description of the global behavior of a nonlinear dynamical system without any explicit prior knowledge of its governing equations. In this paper, we consider a spectral analysis of the Koopman operator in a reproducing kernel Hilbert space (RKHS). We propose a modal decomposition algorithm to perform the analysis using finite-length data sequences generated from a nonlinear system. The algorithm is in essence reduced to the calculation of a set of orthogonal bases for the Krylov matrix in RKHS and the eigendecomposition of the projection of the Koopman operator onto the subspace spanned by the bases. The algorithm returns a decomposition of the dynamics into a finite number of modes, and thus it can be thought of as a feature extraction procedure for a nonlinear dynamical system. Therefore, we further consider applications in machine learning using extracted features with the presented analysis. We illustrate the method on the applications using synthetic and real-world data.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much
He, Bryan D., Sa, Christopher M. De, Mitliagkas, Ioannis, Ré, Christopher
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Statistics and Machine Learning Toolbox - MATLAB & Simulink
Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data. You can use descriptive statistics and plots for exploratory data analysis, fit probability distributions to data, generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models. For multidimensional data analysis, Statistics and Machine Learning Toolbox provides feature selection, stepwise regression, principal component analysis (PCA), regularization, and other dimensionality reduction methods that let you identify variables or features that impact your model. The toolbox provides supervised and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbor, k-means, k-medoids, hierarchical clustering, Gaussian mixture models, and hidden Markov models.
High-dimensional Filtering using Nested Sequential Monte Carlo
Naesseth, Christian A., Lindsten, Fredrik, Schön, Thomas B.
Inference in complex and high-dimensional statistical models is a very challenging problem that is ubiquitous in applications such as climate informatics [Monteleoni et al., 2013], bioinformatics [Cohen, 2004] and machine learning [Wainwright and Jordan, 2008], to mention a few. We are interested in sequential Bayesian inference in settings where we have a sequence of posterior distributions that we need to compute. To be specific, we are focusing on settings where the model (or state variable) is high-dimensional, but where there are local dependencies. One example of the type of models we consider are the so-called spatiotemporal models [Wikle, 2015, Cressie and Wikle, 2011, Rue and Held, 2005]. Sequential Monte Carlo (SMC) methods comprise one of the most successful methodologies for sequential Bayesian inference. However, SMC struggles in high dimensions and these methods are rarely used for dimensions, say, higher than ten [Rebeschini and van Handel, 2015].
Selecting Bases in Spectral learning of Predictive State Representations via Model Entropy
Predictive State Representations (PSRs) are powerful techniques for modelling dynamical systems, which represent a state as a vector of predictions about future observable events (tests). In PSRs, one of the fundamental problems is the learning of the PSR model of the underlying system. Recently, spectral methods have been successfully used to address this issue by treating the learning problem as the task of computing an singular value decomposition (SVD) over a submatrix of a special type of matrix called the Hankel matrix. Under the assumptions that the rows and columns of the submatrix of the Hankel Matrix are sufficient (which usually means a very large number of rows and columns, and almost fails in practice) and the entries of the matrix can be estimated accurately, it has been proven that the spectral approach for learning PSRs is statistically consistent and the learned parameters can converge to the true parameters. However, in practice, due to the limit of the computation ability, only a finite set of rows or columns can be chosen to be used for the spectral learning. While different sets of columns usually lead to variant accuracy of the learned model, in this paper, we propose an approach for selecting the set of columns, namely basis selection, by adopting a concept of model entropy to measure the accuracy of the learned model. Experimental results are shown to demonstrate the effectiveness of the proposed approach.