Learning Graphical Models
Bayesian nonparametric multivariate convex regression
Hannah, Lauren A., Dunson, David B.
X, where f(x) is the gradient of f at x. This is called the convex regression problem. Convex regression can easily be modified to allow concave regression by multiplying all of the values by negative one. Convex regression problems are common in economics, operations research and reinforcement learning. In economics, production functions (Skiba 1978) and consumer preferences (Meyer & Pratt 1968) are often convex, while in operations research and reinforcement learning, value functions for stochastic optimization problems can be convex (Shapiro et al. 2009). If a problem is known to be convex, a convex regression estimate provides advantages over an unrestricted estimate. First, convexity is a powerful regularizer: it places strong conditions on the derivatives--and hence smoothness--of a function. Convexity constraints can substantially reduce overfitting and lead to more accurate predictions. Second, maintaining convexity allows the use of convex optimization solvers when the regression estimate is used in an objective function of an optimization problem. 1 Multivariate convex regression has received relatively little attention in the literature.
Controlling Complexity in Part-of-Speech Induction
Graca, J. V., Ganchev, K., Coheur, L., Pereira, F., Taskar, B.
We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.
A Probabilistic Framework for Learning Kinematic Models of Articulated Objects
Sturm, J., Stachniss, C., Burgard, W.
Robots operating in domestic environments generally need to interact with articulated objects, such as doors, cabinets, dishwashers or fridges. In this work, we present a novel, probabilistic framework for modeling articulated objects as kinematic graphs. Vertices in this graph correspond to object parts, while edges between them model their kinematic relationship. In particular, we present a set of parametric and non-parametric edge models and how they can robustly be estimated from noisy pose observations. We furthermore describe how to estimate the kinematic structure and how to use the learned kinematic models for pose prediction and for robotic manipulation tasks. We finally present how the learned models can be generalized to new and previously unseen objects. In various experiments using real robots with different camera systems as well as in simulation, we show that our approach is valid, accurate and efficient. Further, we demonstrate that our approach has a broad set of applications, in particular for the emerging fields of mobile manipulation and service robotics.
Sparse Partitioning: Nonlinear regression with binary or tertiary predictors, with application to association studies
This paper presents Sparse Partitioning, a Bayesian method for identifying predictors that either individually or in combination with others affect a response variable. The method is designed for regression problems involving binary or tertiary predictors and allows the number of predictors to exceed the size of the sample, two properties which make it well suited for association studies. Sparse Partitioning differs from other regression methods by placing no restrictions on how the predictors may influence the response. To compensate for this generality, Sparse Partitioning implements a novel way of exploring the model space. It searches for high posterior probability partitions of the predictor set, where each partition defines groups of predictors that jointly influence the response. The result is a robust method that requires no prior knowledge of the true predictor--response relationship. Testing on simulated data suggests Sparse Partitioning will typically match the performance of an existing method on a data set which obeys the existing method's model assumptions. When these assumptions are violated, Sparse Partitioning will generally offer superior performance.
Structure Selection from Streaming Relational Data
Mihalkova, Lilyana, Moustafa, Walaa Eldin
Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster.
Lifted Graphical Models: A Survey
Mihalkova, Lilyana, Getoor, Lise
This article presents a survey of work on lifted graphical models. We review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries. We also review work in learning lifted graphical models from data. It is our belief that the need for statistical relational models (whether it goes by that name or another) will grow in the coming decades, as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field.
Feature Reinforcement Learning In Practice
Nguyen, Phuong, Sunehag, Peter, Hutter, Marcus
Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called PhiMDP. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for PhiMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm and the recent active-LZ algorithm, and is competitive with MC-AIXI-CTW that maintains a bayesian mixture over all context trees up to a chosen depth.We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our PhiMDP algorithm is much simpler, yet consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.
A Machine Learning Perspective on Predictive Coding with PAQ
Knoll, Byron, de Freitas, Nando
PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.
Overlapping Mixtures of Gaussian Processes for the Data Association Problem
Lázaro-Gredilla, Miguel, Van Vaerenbergh, Steven, Lawrence, Neil
In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a nonstandard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings. Keywords: Gaussian Processes, Marginalized Variational Inference, Bayesian Models 1. Introduction The data association problem arises in multi-target tracking scenarios.
Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning
Zhang, Zhilin, Rao, Bhaskar D.
We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.