Goto

Collaborating Authors

 Markov Models


Latent Bayesian melding for integrating individual and population models

arXiv.org Machine Learning

In many statistical problems, a more coarse-grained model may be suitable for population-level behaviour, whereas a more detailed model is appropriate for accurate modelling of individual behaviour. This raises the question of how to integrate both types of models. Methods such as posterior regularization follow the idea of generalized moment matching, in that they allow matching expectations between two models, but sometimes both models are most conveniently expressed as latent variable models. We propose latent Bayesian melding, which is motivated by averaging the distributions over populations statistics of both the individual-level and the population-level models under a logarithmic opinion pool framework. In a case study on electricity disaggregation, which is a type of single-channel blind source separation problem, we show that latent Bayesian melding leads to significantly more accurate predictions than an approach based solely on generalized moment matching.


Approximate Counting in SMT and Value Estimation for Probabilistic Programs

arXiv.org Artificial Intelligence

#SMT, or model counting for logical theories, is a well-known hard problem that generalizes such tasks as counting the number of satisfying assignments to a Boolean formula and computing the volume of a polytope. In the realm of satisfiability modulo theories (SMT) there is a growing need for model counting solvers, coming from several application domains (quantitative information flow, static analysis of probabilistic programs). In this paper, we show a reduction from an approximate version of #SMT to SMT. We focus on the theories of integer arithmetic and linear real arithmetic. We propose model counting algorithms that provide approximate solutions with formal bounds on the approximation error. They run in polynomial time and make a polynomial number of queries to the SMT solver for the underlying theory, exploiting "for free" the sophisticated heuristics implemented within modern SMT solvers. We have implemented the algorithms and used them to solve the value problem for a model of loop-free probabilistic programs with nondeterminism.


Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

arXiv.org Machine Learning

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.


Unsupervised Incremental Learning and Prediction of Music Signals

arXiv.org Machine Learning

A system is presented that segments, clusters and predicts musical audio in an unsupervised manner, adjusting the number of (timbre) clusters instantaneously to the audio input. A sequence learning algorithm adapts its structure to a dynamically changing clustering tree. The flow of the system is as follows: 1) segmentation by onset detection, 2) timbre representation of each segment by Mel frequency cepstrum coefficients, 3) discretization by incremental clustering, yielding a tree of different sound classes (e.g. instruments) that can grow or shrink on the fly driven by the instantaneous sound events, resulting in a discrete symbol sequence, 4) extraction of statistical regularities of the symbol sequence, using hierarchical N-grams and the newly introduced conceptual Boltzmann machine, and 5) prediction of the next sound event in the sequence. The system's robustness is assessed with respect to complexity and noisiness of the signal. Clustering in isolation yields an adjusted Rand index (ARI) of 82.7% / 85.7% for data sets of singing voice and drums. Onset detection jointly with clustering achieve an ARI of 81.3% / 76.3% and the prediction of the entire system yields an ARI of 27.2% / 39.2%.


Application of Quantum Annealing to Training of Deep Neural Networks

arXiv.org Machine Learning

In Deep Learning, a well-known approach for training a Deep Neural Network starts by training a generative Deep Belief Network model, typically using Contrastive Divergence (CD), then fine-tuning the weights using backpropagation or other discriminative techniques. However, the generative training can be time-consuming due to the slow mixing of Gibbs sampling. We investigated an alternative approach that estimates model expectations of Restricted Boltzmann Machines using samples from a D-Wave quantum annealing machine. We tested this method on a coarse-grained version of the MNIST data set. In our tests we found that the quantum sampling-based training approach achieves comparable or better accuracy with significantly fewer iterations of generative training than conventional CD-based training. Further investigation is needed to determine whether similar improvements can be achieved for other data sets, and to what extent these improvements can be attributed to quantum effects.


A Bounded $p$-norm Approximation of Max-Convolution for Sub-Quadratic Bayesian Inference on Additive Factors

arXiv.org Machine Learning

Max-convolution is an important problem closely resembling standard convolution; as such, max-convolution occurs frequently across many fields. Here we extend the method with fastest known worst-case runtime, which can be applied to nonnegative vectors by numerically approximating the Chebyshev norm $\| \cdot \|_\infty$, and use this approach to derive two numerically stable methods based on the idea of computing $p$-norms via fast convolution: The first method proposed, with runtime in $O( k \log(k) \log(\log(k)) )$ (which is less than $18 k \log(k)$ for any vectors that can be practically realized), uses the $p$-norm as a direct approximation of the Chebyshev norm. The second approach proposed, with runtime in $O( k \log(k) )$ (although in practice both perform similarly), uses a novel null space projection method, which extracts information from a sequence of $p$-norms to estimate the maximum value in the vector (this is equivalent to querying a small number of moments from a distribution of bounded support in order to estimate the maximum). The $p$-norm approaches are compared to one another and are shown to compute an approximation of the Viterbi path in a hidden Markov model where the transition matrix is a Toeplitz matrix; the runtime of approximating the Viterbi path is thus reduced from $O( n k^2 )$ steps to $O( n $k \log(k))$ steps in practice, and is demonstrated by inferring the U.S. unemployment rate from the S&P 500 stock index.


A latent shared-component generative model for real-time disease surveillance using Twitter data

arXiv.org Machine Learning

Exploiting the large amount of available data for addressing relevant social problems has been one of the key challenges in data mining. Such efforts have been recently named "data science for social good" and attracted the attention of several researchers and institutions. We give a contribution in this objective in this paper considering a difficult public health problem, the timely monitoring of dengue epidemics in small geographical areas. We develop a generative simple yet effective model to connect the fluctuations of disease cases and disease-related Twitter posts. We considered a hidden Markov process driving both, the fluctuations in dengue reported cases and the tweets issued in each region. We add a stable but random source of tweets to represent the posts when no disease cases are recorded. The model is learned through a Markov chain Monte Carlo algorithm that produces the posterior distribution of the relevant parameters. Using data from a significant number of large Brazilian towns, we demonstrate empirically that our model is able to predict well the next weeks of the disease counts using the tweets and disease cases jointly.


Accelerometer based Activity Classification with Variational Inference on Sticky HDP-SLDS

arXiv.org Machine Learning

As part of daily monitoring of human activities, wearable sensors and devices are becoming increasingly popular sources of data. With the advent of smartphones equipped with acceloremeter, gyroscope and camera; it is now possible to develop activity classification platforms everyone can use conveniently. In this paper, we propose a fast inference method for an unsupervised non-parametric time series model namely variational inference for sticky HDP-SLDS(Hierarchical Dirichlet Process Switching Linear Dynamical System). We show that the proposed algorithm can differentiate various indoor activities such as sitting, walking, turning, going up/down the stairs and taking the elevator using only the acceloremeter of an Android smartphone Samsung Galaxy S4. We used the front camera of the smartphone to annotate activity types precisely. We compared the proposed method with Hidden Markov Models with Gaussian emission probabilities on a dataset of 10 subjects. We showed that the efficacy of the stickiness property. We further compared the variational inference to the Gibbs sampler on the same model and show that variational inference is faster in one order of magnitude.


Data-Driven Learning of the Number of States in Multi-State Autoregressive Models

arXiv.org Machine Learning

In this work, we consider the class of multi-state autoregressive processes that can be used to model non-stationary time-series of interest. In order to capture different autoregressive (AR) states underlying an observed time series, it is crucial to select the appropriate number of states. We propose a new model selection technique based on the Gap statistics, which uses a null reference distribution on the stable AR filters to check whether adding a new AR state significantly improves the performance of the model. To that end, we define a new distance measure between AR filters based on mean squared prediction error (MSPE), and propose an efficient method to generate random stable filters that are uniformly distributed in the coefficient space. Numerical results are provided to evaluate the performance of the proposed approach.


Texture Modelling with Nested High-order Markov-Gibbs Random Fields

arXiv.org Machine Learning

Currently, Markov-Gibbs random field (MGRF) image models which include high-order interactions are almost always built by modelling responses of a stack of local linear filters. Actual interaction structure is specified implicitly by the filter coefficients. In contrast, we learn an explicit high-order MGRF structure by considering the learning process in terms of general exponential family distributions nested over base models, so that potentials added later can build on previous ones. We relatively rapidly add new features by skipping over the costly optimisation of parameters. We introduce the use of local binary patterns as features in MGRF texture models, and generalise them by learning offsets to the surrounding pixels. These prove effective as high-order features, and are fast to compute. Several schemes for selecting high-order features by composition or search of a small subclass are compared. Additionally we present a simple modification of the maximum likelihood as a texture modelling-specific objective function which aims to improve generalisation by local windowing of statistics. The proposed method was experimentally evaluated by learning high-order MGRF models for a broad selection of complex textures and then performing texture synthesis, and succeeded on much of the continuum from stochastic through irregularly structured to near-regular textures. Learning interaction structure is very beneficial for textures with large-scale structure, although those with complex irregular structure still provide difficulties. The texture models were also quantitatively evaluated on two tasks and found to be competitive with other works: grading of synthesised textures by a panel of observers; and comparison against several recent MGRF models by evaluation on a constrained inpainting task.