Goto

Collaborating Authors

 Learning Graphical Models


Human Activity Learning and Segmentation using Partially Hidden Discriminative Models

arXiv.org Machine Learning

Learning and understanding the typical patterns in the daily activities and routines of people from low-level sensory data is an important problem in many application domains such as building smart environments, or providing intelligent assistance. Traditional approaches to this problem typically rely on supervised learning and generative models such as the hidden Markov models and its extensions. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support supervised training is often extremely expensive. In this paper, we propose a new approach based on semi-supervised training of partially hidden discriminative models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that these models allow us to incorporate both labeled and unlabeled data for learning, and at the same time, provide us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart, the partially hidden Markov model, even when a substantial amount of labels are unavailable.


A Bayesian estimation approach to analyze non-Gaussian data-generating processes with latent classes

arXiv.org Machine Learning

Several methods have recently been proposed to discover a complete causal structure, that is, all the causal directions, under the assumption that disturbance variables have non-Gaussian distributions. However, the estimation results can be biased if there are "latent classes." Latent classes are unobserved discrete variables that have more than one observed child variables. Data that has been generated from different processes are mixed in the presence of latent classes. Several methods have been proposed to estimate the causal structure in the presence of latent classes [12], but all of these are affected by local optima. Therefore, in this paper, we propose a new estimation approach that can solve this problem. The structure of this paper is as follows.


Conditional Restricted Boltzmann Machines for Cold Start Recommendations

arXiv.org Machine Learning

Restricted Boltzman Machines (RBMs) have been successfully used in recommender systems. However, as with most of other collaborative filtering techniques, it cannot solve cold start problems for there is no rating for a new item. In this paper, we first apply conditional RBM (CRBM) which could take extra information into account and show that CRBM could solve cold start problem very well, especially for rating prediction task. CRBM naturally combine the content and collaborative data under a single framework which could be fitted effectively. Experiments show that CRBM can be compared favourably with matrix factorization models, while hidden features learned from the former models are more easy to be interpreted.


Thurstonian Boltzmann Machines: Learning from Multiple Inequalities

arXiv.org Machine Learning

Restricted Boltzmann machines (RBMs) have proved to be a versatile tool for a wide variety of machine learning tasks and as a building block for deep architectures [12, 24, 28]. The original proposals mainly handle binary visible and hidden units. Whilst binary hidden units are broadly applicable as feature detectors, non-binary visible data requires different designs. Recent extensions to other data types result in type-dependent models: the Gaussian for continuous inputs [12], Beta for bounded continuous inputs [16], Poisson for count data [9], multinomial for unordered categories [25], and ordinal models for ordered categories [37, 35]. The Boltzmann distribution permits several types to be jointly modelled, thus making the RBM a good tool for multimodal and complex social survey analysis. The work of [20, 29, 40] combines continuous (e.g., visual and audio) and discrete modalities (e.g., words). The work of [34] extends the idea further to incorporate ordinal and rank data. However, there are conceptual drawbacks: First, conditioned on the hidden layer, they are still separate type-specific models; second, handling ordered categories and ranks is not natural; and third, specifying direct correlation between these types remains difficult. The main thesis of this paper is that many data types can be captured in one unified model.


Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis

arXiv.org Machine Learning

Restricted Boltzmann machines (RBMs) [36, 9, 20] have recently attracted significant interest due to their versatility in a variety of unsupervised and supervised learning tasks [35, 18, 25], and in building deep architectures [14, 31]. A RBM is a bipartite undirected model that captures the generative process in which a data vector is generated from a binary hidden vector. The bipartite architecture enables very fast data encoding and sampling-based inference; and together with recent advances in learning procedures, we can now process massive data with large models [13, 37, 2]. This paper presents our contributions in developing RBM specifications as well as learning and inference procedures for multivariate ordinal data. This extends and consolidates the reach of RBMs to a wide range of user-generated domains - social responses, recommender systems, product/paper reviews, and expert assessments of health and ecosystems indicators.


Learning From Ordered Sets and Applications in Collaborative Ranking

arXiv.org Machine Learning

Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed $5$ stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches $(N!/2)6.93145^{N+1}$ as $N$ approaches infinity. We propose a \texttt{split-and-merge} Metropolis-Hastings procedure that can explore the state-space efficiently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be efficiently evaluated. Finally, we evaluate the proposed model on large-scale collaborative filtering tasks and demonstrate that it is competitive against state-of-the-art methods.


Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

arXiv.org Machine Learning

Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach -- the Bayesian Ising Approximation (BIA) -- to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high dimensional regression by analyzing a gene expression dataset with nearly 30,000 features.


Probabilistic Inference in Credal Networks: New Complexity Results

Journal of Artificial Intelligence Research

Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove a similar result for inference in naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and naive Bayes structures are used in real applications of imprecise probability.


Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

arXiv.org Machine Learning

In Bayesian analysis the form of the posterior distribution is often not analytically tractable. To obtain quantities of interest under such a distribution, such as moments or marginal distributions, we typically need to use Monte Carlo methods or approximate the posterior with a more convenient distribution. A popular method of obtaining such an approximation is structured or fixed-form Variational Bayes, which works by numerically minimizing the Kullback-Leibler divergence of an approximating distribution in the exponential family to the intractable target distribution (Attias, 2000; Beal and Ghahramani, 2006; Jordan et al., 1999; Wainwright and Jordan, 2008). For certain problems, algorithms exist that can solve this optimization problem in much less time than it would take to approximate the posterior using Monte Carlo methods (see e.g.


Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data

arXiv.org Machine Learning

Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dependence between the target and the remaining variables in the data using conditional dependence measures (CDMs), since this strategy is guaranteed to find the target's direct causes, direct effects, and direct causes of the direct effects in the infinite sample limit. In this paper, we design a new algorithm in order to systematically compare the relative abilities of DMs and CDMs in discovering causal variables from gene expression data. Results: The proposed algorithm using a CDM is sample efficient, since it consistently outperforms other state-of-the-art local causal discovery algorithms when samples sizes are small. However, the proposed algorithm using a CDM outperforms the proposed algorithm using a DM only when sample sizes are above several hundred. These results suggest that accurate causal discovery from gene expression data using current CDM-based algorithms requires datasets with at least several hundred samples. Availability: The proposed algorithm is freely available at https://github.com/ericstrobl/DvCD.