Goto

Collaborating Authors

 Markov Models


Asymmetric Distributions from Constrained Mixtures

arXiv.org Machine Learning

This paper introduces constrained mixtures for continuous distributions, characterized by a mixture of distributions where each distribution has a shape similar to the base distribution and disjoint domains. This new concept is used to create generalized asymmetric versions of the Laplace and normal distributions, which are shown to define exponential families, with known conjugate priors, and to have maximum likelihood estimates for the original parameters, with known closed-form expressions. The asymmetric and symmetric normal distributions are compared in a linear regression example, showing that the asymmetric version performs at least as well as the symmetric one, and in a real world time-series problem, where a hidden Markov model is used to fit a stock index, indicating that the asymmetric version provides higher likelihood and may learn distribution models over states and transition distributions with considerably less entropy.


Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014

arXiv.org Machine Learning

This paper presents a novel approach to speaker subspace modelling based on Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is based on the idea of shared factors as in the Probabilistic Linear Discriminant Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors, herein the speaker factor is shared over all vectors of the speaker. Then Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced. Various new scoring techniques for speaker verification using GRBM are proposed. The results for NIST i-vector Challenge 2014 dataset are presented.


Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

arXiv.org Machine Learning

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms that operate on three timescales - a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers on the slowest timescale. In the discounted setting, we point out the difficulty in estimating the gradient of the variance of the return and incorporate simultaneous perturbation approaches to alleviate this. The average setting, on the other hand, allows for an actor update using compatible features to estimate the gradient of the variance. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.


CORPP: Commonsense Reasoning and Probabilistic Planning, as Applied to Dialog with a Mobile Robot

AAAI Conferences

In order to be fully robust and responsive to a dynamically changing real-world environment, intelligent robots will need to engage in a variety of simultaneous reasoning modalities. In particular, in this paper we consider their needs to i) reason with commonsense knowledge, ii) model their nondeterministic action outcomes and partial observability, and iii) plan toward maximizing long-term rewards. On one hand, Answer Set Programming (ASP) is good at representing and reasoning with commonsense and default knowledge, but is ill-equipped to plan under probabilistic uncertainty. On the other hand, Partially Observable Markov Decision Processes (POMDPs) are strong at planning under uncertainty toward maximizing long-term rewards, but are not designed to incorporate commonsense knowledge and inference. This paper introduces the CORPP algorithm which combines P-log, a probabilistic extension of ASP, with POMDPs to integrate commonsense reasoning with planning under uncertainty. Our approach is fully implemented and tested on a shopping request identification problem both in simulation and on a real robot. Compared with existing approaches using P-log or POMDPs individually, we observe significant improvements in both efficiency and accuracy.


A Probabilistic Extension of the Stable Model Semantics

AAAI Conferences

We present a probabilistic extension of logic programs under the stable model semantics, inspired by the idea of Markov Logic Networks. The proposed language, called LP MLN , is a generalization of logic programs under the stable model semantics, and as such, embraces the rich body of research in knowledge representation. The language is also a generalization of ProbLog, and is closely related to Markov Logic Networks, which implies that the computation can be carried out by the techniques developed for them. ย LP MLN appears to be a natural language for probabilistic answer set programming, and as an example we show how an elaboration tolerant representation of transition systems in answer set programs can be naturally extended to the probabilistic setting.


Scalable Latent Tree Model and its Application to Health Analytics

arXiv.org Machine Learning

Latent tree graphical models are a popular class of latent variable models, where a probability distribution involving observed and hidden variables are Markovian on a tree. Due to the fact that structure of (observable and hidden) variable interactions are approximated as a tree, inference on latent trees can be carried out exactly through a simple belief propagation [Pea88]. Therefore, latent tree graphical models present a good tradeoff between model accuracy and computational complexity. They are applicable in many domains, where it is natural to expect hierarchical or sequential relationships among the variables (through a hidden-Markov model). For instance, latent tree models have been employed for phylogenetic reconstruction [DEKM99], object recognition [CTW12a, CTW12b] and human pose estimation [WL13]. In this paper, we use latent tree model for discovering a hierarchy among diseases based on comorbidities exhibited in patients' health records, i.e. co-occurrences of diseases in patients. In particular, two large healthcare datasets of 30K and 1.6M patients are used to build the latent disease trees, where clinically meaningful disease clusters are identified as shown in fig 3 and 4. The task of learning a latent tree models consists of two parts: learning the tree structure, and learning the parameters of the tree. There exist many challenges which prohibit efficient or guaranteed learning of the latent tree graphical model, which will be addressed in this paper: 1. The location and the number of latent variables are hidden and the marginalized graph over the observable variables no longer conforms to a tree structure.


An Adaptive Online HDP-HMM for Segmentation and Classification of Sequential Data

arXiv.org Machine Learning

The joint problem of time segmentation and recognition of sequential data into meaningful subsequences has attracted significant research in a variety of domains. The ability to automatically segment and classify data is a core technology for applications like speaker diarisation, finance, activity understanding, multimedia annotation and human-computer interaction. To date, the main proposed solutions have included sliding windows [1], the hidden Markov model (HMM) [2], conditional random fields [3] [4], and structural SVM [5], covering the spectrum of generative, discriminative and maximum-margin dynamic classifiers. Along with advancements in learning and inference, research has witnessed increasingly realistic datasets which are bridging the gap between lab and real applications [6] [7]. Nevertheless, important challenges such as model adaptation and dynamic class sets remain unresolved. We address both these limitations by an adaptive online model that can accommodate an unlimited (theoretically infinite) number of classes. In a nutshell, this is achieved by applying a Bayesian nonparametric model, the hierarchical Dirichlet process (HDP), as the prior for a hidden Markov model (a model known as HDP-HMM [8] [9]), and exploiting an adaptive learning rate for model adaptation. The proposed model provides an adaptive online learning approach for joint segmentation and recognition of sequential data 1 with incremental class sets and we refer to it as ADON HDP-HMM in the following.


Qualitative inequalities for squared partial correlations of a Gaussian random vector

arXiv.org Machine Learning

We describe various sets of conditional independence relationships, sufficient for qualitatively comparing non-vanishing squared partial correlations of a Gaussian random vector. These sufficient conditions are satisfied by several graphical Markov models. Rules for comparing degree of association among the vertices of such Gaussian graphical models are also developed. We apply these rules to compare conditional dependencies on Gaussian trees. In particular for trees, we show that such dependence can be completely characterized by the length of the paths joining the dependent vertices to each other and to the vertices conditioned on. We also apply our results to postulate rules for model selection for polytree models. Our rules apply to mutual information of Gaussian random vectors as well.


Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

arXiv.org Machine Learning

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.


L_1-regularized Boltzmann machine learning using majorizer minimization

arXiv.org Machine Learning

We propose an inference method to estimate sparse interactions and biases according to Boltzmann machine learning. The basis of this method is $L_1$ regularization, which is often used in compressed sensing, a technique for reconstructing sparse input signals from undersampled outputs. $L_1$ regularization impedes the simple application of the gradient method, which optimizes the cost function that leads to accurate estimations, owing to the cost function's lack of smoothness. In this study, we utilize the majorizer minimization method, which is a well-known technique implemented in optimization problems, to avoid the non-smoothness of the cost function. By using the majorizer minimization method, we elucidate essentially relevant biases and interactions from given data with seemingly strongly-correlated components.