Supervised Learning
Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Li, Fuxin, Sminchisescu, Cristian
Multiple-Instance learning has been long known as a hard non-convex problem. In this work, we propose an approach that recasts it as a convex likelihood ratio estimation problem. Firstly, the constraint in multiple-instance learning is reformulated into a convex constraint on the likelihood ratio. Then we show that a joint estimation of a likelihood ratio function and the likelihood on training instances can be learned convexly. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio estimation. It is shown that our likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. However with the joint estimation it tends to underestimate the likelihood of an example to be positive. We propose to use these likelihood ratio estimates as features, and learn a linear combination on them to classify the bags. Experiments on synthetic and real datasets show the superiority of the approach.
A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
In this paper we propose an approximated learning framework for large scale graphical models and derive message passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in the CRF's primal a variant of the log-partition function, known as soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for structured prediction problems using Fenchel duality based on a local entropy approximation that computes the exact gradients of the approximated problem and is guaranteed to converge. Unlike existing approaches, this allow us to learn graphical models with cycles and very large number of parameters efficiently. We demonstrate the effectiveness of our approach in an image denoising task. This task was previously solved by sharing parameters across cliques. In contrast, our algorithm is able to efficiently learn large number of parameters resulting in orders of magnitude better prediction.
Online Multiple Kernel Learning for Structured Prediction
Martins, Andre F. T., Figueiredo, Mario A. T., Aguiar, Pedro M. Q., Smith, Noah A., Xing, Eric P.
Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that drawback. We show regret, convergence, and generalization bounds for the proposed method. Experiments on handwriting recognition and dependency parsing testify for the successfulness of the approach.
Cause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction
Abedin, M. A., Ng, V., Khan, L.
The Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all and only those shaping factors that are responsible for the incidents described in a report. We investigate two approaches to cause identification. Both approaches exploit information provided by a semantic lexicon, which is automatically constructed via Thelen and Riloff's Basilisk framework augmented with our linguistic and algorithmic modifications. The first approach labels a report using a simple heuristic, which looks for the words and phrases acquired during the semantic lexicon learning process in the report. The second approach recasts cause identification as a text classification problem, employing supervised and transductive text classification algorithms to learn models from incident reports labeled with shaping factors and using the models to label unseen reports. Our experiments show that both the heuristic-based approach and the learning-based approach (when given sufficient training data) outperform the baseline system significantly.
Learning to Predict Combinatorial Structures
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
Fast, smooth and adaptive regression in metric spaces
It was recently shown that certain nonparametric regressors can escape the curse of dimensionality in the sense that their convergence rates adapt to the intrinsic dimension of data (\cite{BL:65, SK:77}). We prove some stronger results in more general settings. In particular, we consider a regressor which, by combining aspects of both tree-based regression and kernel regression, operates on a general metric space, yields a smooth function, and evaluates in time $O(\log n)$. We derive a tight convergence rate of the form $n^{-2/(2+d)}$ where $d$ is the Assouad dimension of the input space.
Robust Nonparametric Regression with Metric-Space Valued Output
Motivated by recent developments in manifold-valued regression we propose a family of nonparametric kernel-smoothing estimators with metric-space valued output including a robust median type estimator and the classical Frechet mean. Depending on the choice of the output space and the chosen metric the estimator reduces to partially well-known procedures for multi-class classification, multivariate regression in Euclidean space, regression with manifold-valued output and even some cases of structured output learning. In this paper we focus on the case of regression with manifold-valued input and output. We show pointwise and Bayes consistency for all estimators in the family for the case of manifold-valued output and illustrate the robustness properties of the estimator with experiments.
Speaker Comparison with Inner Product Discriminant Functions
Karam, Zahi, Sturim, Douglas, Campbell, William M.
Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications---speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison: support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.
Online Estimation of SAT Solving Runtime
We present an online method for estimating the cost of solving SAT problems. Modern SAT solvers present several challenges to estimate search cost including non-chronological backtracking, learning and restarts. Our method uses a linear model trained on data gathered at the start of search. We show the effectiveness of this method using random and structured problems. We demonstrate that predictions made in early restarts can be used to improve later predictions. We also show that we can use such cost estimations to select a solver from a portfolio.