Statistical Learning
Controlling Complexity in Part-of-Speech Induction
Graca, J. V., Ganchev, K., Coheur, L., Pereira, F., Taskar, B.
We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.
A Probabilistic Framework for Learning Kinematic Models of Articulated Objects
Sturm, J., Stachniss, C., Burgard, W.
Robots operating in domestic environments generally need to interact with articulated objects, such as doors, cabinets, dishwashers or fridges. In this work, we present a novel, probabilistic framework for modeling articulated objects as kinematic graphs. Vertices in this graph correspond to object parts, while edges between them model their kinematic relationship. In particular, we present a set of parametric and non-parametric edge models and how they can robustly be estimated from noisy pose observations. We furthermore describe how to estimate the kinematic structure and how to use the learned kinematic models for pose prediction and for robotic manipulation tasks. We finally present how the learned models can be generalized to new and previously unseen objects. In various experiments using real robots with different camera systems as well as in simulation, we show that our approach is valid, accurate and efficient. Further, we demonstrate that our approach has a broad set of applications, in particular for the emerging fields of mobile manipulation and service robotics.
Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods
DeMasi, Orianna, Meza, Juan, Bailey, David H.
Ensemble methods for supervised machine learning have become popular due to their ability to accurately predict class labels with groups of simple, lightweight "base learners." While ensembles offer computationally efficient models that have good predictive capability they tend to be large and offer little insight into the patterns or structure in a dataset. We consider an ensemble technique that returns a model of ranked rules. The model accurately predicts class labels and has the advantage of indicating which parameter constraints are most useful for predicting those labels. An example of the rule ensemble method successfully ranking rules and selecting attributes is given with a dataset containing images of potential supernovas where the number of necessary features is reduced from 39 to 21. We also compare the rule ensemble method on a set of multi-class problems with boosting and bagging, which are two well known ensemble techniques that use decision trees as base learners, but do not have a rule ranking scheme.
Sparse Partitioning: Nonlinear regression with binary or tertiary predictors, with application to association studies
This paper presents Sparse Partitioning, a Bayesian method for identifying predictors that either individually or in combination with others affect a response variable. The method is designed for regression problems involving binary or tertiary predictors and allows the number of predictors to exceed the size of the sample, two properties which make it well suited for association studies. Sparse Partitioning differs from other regression methods by placing no restrictions on how the predictors may influence the response. To compensate for this generality, Sparse Partitioning implements a novel way of exploring the model space. It searches for high posterior probability partitions of the predictor set, where each partition defines groups of predictors that jointly influence the response. The result is a robust method that requires no prior knowledge of the true predictor--response relationship. Testing on simulated data suggests Sparse Partitioning will typically match the performance of an existing method on a data set which obeys the existing method's model assumptions. When these assumptions are violated, Sparse Partitioning will generally offer superior performance.
Lifted Graphical Models: A Survey
Mihalkova, Lilyana, Getoor, Lise
This article presents a survey of work on lifted graphical models. We review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries. We also review work in learning lifted graphical models from data. It is our belief that the need for statistical relational models (whether it goes by that name or another) will grow in the coming decades, as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field.
Prediction of peptide bonding affinity: kernel methods for nonlinear modeling
Bergeron, Charles, Hepburn, Theresa, Sundling, C. Matthew, Krein, Michael, Katt, Bill, Sukumar, Nagamani, Breneman, Curt M., Bennett, Kristin P.
Comparative Evaluation of Prediction Algorithms(COEPRA, http://www.coepra.org/) is a modeling competition organized to provide objective testing of various algorithms via the process of blind prediction for chemical, biological, and medical data. COEPRA's stated goals are to advance modeling algorithms and software as well as provide reference datasets to the research community. Transferable Atom Equivalent (TAE) RECON features are electron-density derived descriptors obtained by fragment reconstruction. MOE features are geometrical, structural, physiochemical and topological 2D descriptors.
Low-rank Matrix Recovery from Errors and Erasures
Chen, Yudong, Jalali, Ali, Sanghavi, Sujay, Caramanis, Constantine
Low-rank matrices play a central role in large-scale data analysis and dimensionality reduction. They arise in a variety of application areas, among them Principal Component Analysis (PCA), Multidimensional scaling (MDS), Spectral Clustering and related methods, ranking and collaborative filtering, etc. In all these problems, low-rank structure is used to either approximate a general matrix, or to correct for corrupted or missing data. This paper considers the recovery of a low-rank matrix in the simultaneous presence of (a) erasures: most elements are not observed, and (b): errors: among the ones that are observed, a significant fraction at unknown locations are grossly/maliciously corrupted. It is now well recognized that the standard, popular approach to low-rank matrix recovery using SVD as a first step fails spectacularly in this setting [1]. Low-rank matrix completion, which considers only random erasures ([2], [3]) will also fail with even just a few maliciously corrupted entries. In light of this, several recent works have studied an alternate approach based on the natural convex relaxation of minimizing rank plus support. One approach [4], [5] provides deterministic/worst case guarantees for the fully observed setting (i.e.
Submodular Optimization for Efficient Semi-supervised Support Vector Machines
Emara, Wael, Kantardzic, Mehmed
Abstract--In this work we present a quadratic programming approximation of the Semi-Supervised Support V ector Machine (S3VM) problem, namely approximate QP-S3VM, that can be efficiently solved using off the shelf optimization packages. We prove that this approximate formulation establishes a relation between the low density separation and the graph-based models of semi-supervised learning (SSL) which is important to develop a unifying framework for semi-supervised learning methods. Furthermore, we propose the novel idea of representing SSL problems as submodular set functions and use efficient sub-modular optimization algorithms to solve them. Using this new idea we develop a representation of the approximate QP-S3VM as a maximization of a submodular set function which makes it possible to optimize using efficient greedy algorithms. We demonstrate that the proposed methods are accurate and provide significant improvement in time complexity over the state of the art in the literature. The recent advances in information technology imposes serious challenges on traditional machine learning algorithms where classification models are trained using labeled samples. Data collection and storage nowadays has never been easier and therefore using such enormous volumes of data to infer reliable classification models is of utmost importance.
Toward Parts-Based Scene Understanding with Pixel-Support Parts-Sparse Pictorial Structures
Scene understanding remains a significant challenge in the computer vision community. The visual psychophysics literature has demonstrated the importance of interdependence among parts of the scene. Yet, the majority of methods in computer vision remain local. Pictorial structures have arisen as a fundamental parts-based model for some vision problems, such as articulated object detection. However, the form of classical pictorial structures limits their applicability for global problems, such as semantic pixel labeling. In this paper, we propose an extension of the pictorial structures approach, called pixel-support parts-sparse pictorial structures, or PS3, to overcome this limitation. Our model extends the classical form in two ways: first, it defines parts directly based on pixel-support rather than in a parametric form, and second, it specifies a space of plausible parts-based scene models and permits one to be used for inference on any given image. PS3 makes strides toward unifying object-level and pixel-level modeling of scene elements. In this report, we implement the first half of our model and rely upon external knowledge to provide an initial graph structure for a given image. Our experimental results on benchmark datasets demonstrate the capability of this new parts-based view of scene modeling.
Stability Conditions for Online Learnability
Ross, Stephane, Bagnell, J. Andrew
Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). Such conditions have recently been shown to be more powerful to characterize learnability in the general learning setting under i.i.d. samples where uniform convergence is not necessary for learnability, but where stability is both sufficient and necessary for learnability. We here show that similar stability conditions are also sufficient for online learnability, i.e. whether there exists a learning algorithm such that under any sequence of examples (potentially chosen adversarially) produces a sequence of hypotheses that has no regret in the limit with respect to the best hypothesis in hindsight. We introduce online stability, a stability condition related to uniform-leave-one-out stability in the batch setting, that is sufficient for online learnability. In particular we show that popular classes of online learners, namely algorithms that fall in the category of Follow-the-(Regularized)-Leader, Mirror Descent, gradient-based methods and randomized algorithms like Weighted Majority and Hedge, are guaranteed to have no regret if they have such online stability property. We provide examples that suggest the existence of an algorithm with such stability condition might in fact be necessary for online learnability. For the more restricted binary classification setting, we establish that such stability condition is in fact both sufficient and necessary. We also show that for a large class of online learnable problems in the general learning setting, namely those with a notion of sub-exponential covering, no-regret online algorithms that have such stability condition exists.