Goto

Collaborating Authors

 Undirected Networks


Markov Chains on Orbits of Permutation Groups

arXiv.org Artificial Intelligence

We present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We establish an insightful connection between model symmetries and rapid mixing of orbital Markov chains. Thus, we present the first lifted MCMC algorithm for probabilistic graphical models. Both analytical and empirical results demonstrate the effectiveness and efficiency of the approach.


Sensitivity analysis for finite Markov chains in discrete time

arXiv.org Artificial Intelligence

When the initial and transition probabilities of a finite Markov chain in discrete time are not well known, we should perform a sensitivity analysis. This is done by considering as basic uncertainty models the so-called credal sets that these probabilities are known or believed to belong to, and by allowing the probabilities to vary over such sets. This leads to the definition of an imprecise Markov chain. We show that the time evolution of such a system can be studied very efficiently using so-called lower and upper expectations. We also study how the inferred credal set about the state at time n evolves as n->infinity: under quite unrestrictive conditions, it converges to a uniquely invariant credal set, regardless of the credal set given for the initial state. This leads to a non-trivial generalisation of the classical Perron-Frobenius Theorem to imprecise Markov chains.


MCMC for Hierarchical Semi-Markov Conditional Random Fields

arXiv.org Machine Learning

Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.


Boosted Markov Networks for Activity Recognition

arXiv.org Machine Learning

Recognising human activities using sensors is currently a major challenge in research. Typically, the information extracted directly from sensors is either not discriminative enough or is too noisy to infer activities occurring in the scene. Human activities are complex and evolve dynamically over time. Temporal probabilistic models such as hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs) have been the dominant models used to solve the problem [1, 4, 19]. However, these models make a strong assumption in the generative process by which the data is generated in the model. This makes the representation of complex sensor data very difficult, and possibly results in large models. Markov networks (MNs) (also known as Markov random fields) offer an alternative approach, especially in form of conditional random fields (CRFs) [10]. In CRFs, the observation is not modelled, and so we have the freedom to incorporate overlapping features, multiple sensor fusion, and long-range dependencies into the model.


Mixed-Variate Restricted Boltzmann Machines

arXiv.org Machine Learning

Restricted Boltzmann Machines (RBM) [9, 5] have recently attracted an increasing attention for their rich capacity in a variety of learning tasks, including multivariate distribution modelling, feature extraction, classification, and construction of deep architectures [8, 19]. An RBM is a two-layer Markov random field in which the visible layer represents observed variables and the hidden layer represents latent aspects of the data. Pairwise interactions are only permitted for units between layers. As a result, the posterior distribution over the hidden variables and the probability of the data generative model are easy to evaluate, allowing fast feature extraction and efficient sampling-based inference [7]. Nonetheless, most existing work in RBMs implicitly assumes that the visible layer contains variables of the same modality. By far the most popular input types are binary [5] and Gaussian [8]. Recent extension includes categorical [21], ordinal [25], Poisson [6] and Beta [13] data. To the best of our knowledge, none has been considered for multicategorical and category-ranking data, nor for a mixed combination of these data types. In this paper, we investigate a generalisation of the RBM for variables of multiple modalities and types.


Human Activity Learning and Segmentation using Partially Hidden Discriminative Models

arXiv.org Machine Learning

Learning and understanding the typical patterns in the daily activities and routines of people from low-level sensory data is an important problem in many application domains such as building smart environments, or providing intelligent assistance. Traditional approaches to this problem typically rely on supervised learning and generative models such as the hidden Markov models and its extensions. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support supervised training is often extremely expensive. In this paper, we propose a new approach based on semi-supervised training of partially hidden discriminative models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that these models allow us to incorporate both labeled and unlabeled data for learning, and at the same time, provide us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart, the partially hidden Markov model, even when a substantial amount of labels are unavailable.


Conditional Restricted Boltzmann Machines for Cold Start Recommendations

arXiv.org Machine Learning

Restricted Boltzman Machines (RBMs) have been successfully used in recommender systems. However, as with most of other collaborative filtering techniques, it cannot solve cold start problems for there is no rating for a new item. In this paper, we first apply conditional RBM (CRBM) which could take extra information into account and show that CRBM could solve cold start problem very well, especially for rating prediction task. CRBM naturally combine the content and collaborative data under a single framework which could be fitted effectively. Experiments show that CRBM can be compared favourably with matrix factorization models, while hidden features learned from the former models are more easy to be interpreted.


Thurstonian Boltzmann Machines: Learning from Multiple Inequalities

arXiv.org Machine Learning

Restricted Boltzmann machines (RBMs) have proved to be a versatile tool for a wide variety of machine learning tasks and as a building block for deep architectures [12, 24, 28]. The original proposals mainly handle binary visible and hidden units. Whilst binary hidden units are broadly applicable as feature detectors, non-binary visible data requires different designs. Recent extensions to other data types result in type-dependent models: the Gaussian for continuous inputs [12], Beta for bounded continuous inputs [16], Poisson for count data [9], multinomial for unordered categories [25], and ordinal models for ordered categories [37, 35]. The Boltzmann distribution permits several types to be jointly modelled, thus making the RBM a good tool for multimodal and complex social survey analysis. The work of [20, 29, 40] combines continuous (e.g., visual and audio) and discrete modalities (e.g., words). The work of [34] extends the idea further to incorporate ordinal and rank data. However, there are conceptual drawbacks: First, conditioned on the hidden layer, they are still separate type-specific models; second, handling ordered categories and ranks is not natural; and third, specifying direct correlation between these types remains difficult. The main thesis of this paper is that many data types can be captured in one unified model.


Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis

arXiv.org Machine Learning

Restricted Boltzmann machines (RBMs) [36, 9, 20] have recently attracted significant interest due to their versatility in a variety of unsupervised and supervised learning tasks [35, 18, 25], and in building deep architectures [14, 31]. A RBM is a bipartite undirected model that captures the generative process in which a data vector is generated from a binary hidden vector. The bipartite architecture enables very fast data encoding and sampling-based inference; and together with recent advances in learning procedures, we can now process massive data with large models [13, 37, 2]. This paper presents our contributions in developing RBM specifications as well as learning and inference procedures for multivariate ordinal data. This extends and consolidates the reach of RBMs to a wide range of user-generated domains - social responses, recommender systems, product/paper reviews, and expert assessments of health and ecosystems indicators.


Probabilistic Inference in Credal Networks: New Complexity Results

Journal of Artificial Intelligence Research

Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove a similar result for inference in naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and naive Bayes structures are used in real applications of imprecise probability.