Goto

Collaborating Authors

 Supervised Learning


Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

arXiv.org Machine Learning

Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be thought of as polynomial combination of input features in a high dimensional reproducing kernel Hilbert space (RKHS). We learn kernels over input and output for structured data, such that, dependency between kernel features is maximized. We use Hilbert-Schmidt Independence Criterion (HSIC) to measure this. We also give an efficient, matrix decomposition-based algorithm to learn these kernel transformations, and demonstrate state-of-the-art results on several real-world datasets.


Calibrated Structured Prediction

Neural Information Processing Systems

In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.


Predtron: A Family of Online Algorithms for General Prediction Problems

Neural Information Processing Systems

Modern prediction problems arising in multilabel learning and learning to rank pose unique challenges to the classical theory of supervised learning. These problems have large prediction and label spaces of a combinatorial nature and involve sophisticated loss functions. We offer a general framework to derive mistake driven online algorithms and associated loss bounds. The key ingredients in our framework are a general loss function, a general vector space representation of predictions, and a notion of margin with respect to a general norm. Our general algorithm, Predtron, yields the perceptron algorithm and its variants when instantiated on classic problems such as binary classification, multiclass classification, ordinal regression, and multilabel classification. For multilabel ranking and subset ranking, we derive novel algorithms, notions of margins, and loss bounds. A simulation study confirms the behavior predicted by our bounds and demonstrates the flexibility of the design choices in our framework.


Neural Network Matrix Factorization

arXiv.org Machine Learning

Data often comes in the form of an array or matrix. Matrix factorization techniques attempt to recover missing or corrupted entries by assuming that the matrix can be written as the product of two low-rank matrices. In other words, matrix factorization approximates the entries of the matrix by a simple, fixed function---namely, the inner product---acting on the latent feature vectors for the corresponding row and column. Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors. In particular, we replace the inner product by a multi-layer feed-forward neural network, and learn by alternating between optimizing the network for fixed latent features, and optimizing the latent features for a fixed network. The resulting approach---which we call neural network matrix factorization or NNMF, for short---dominates standard low-rank techniques on a suite of benchmark but is dominated by some recent proposals that take advantage of the graph features. Given the vast range of architectures, activation functions, regularizers, and optimization techniques that could be used within the NNMF framework, it seems likely the true potential of the approach has yet to be reached.


A Generative Model of Words and Relationships from Multiple Sources

arXiv.org Machine Learning

Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlapping vocabularies.


Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization

arXiv.org Artificial Intelligence

Rapid development of mobile devices has led to an explosive growth of user-generated images and videos, which creates a demand for computational understanding of visual media content. In addition to recognition of objective content, such as objects and scenes, an important dimension of video content analysis is the understanding of emotional or affective content, i.e. estimating the emotional impact of the video on a viewer. Emotional content can strongly resonate with viewers and plays a crucial role in the videowatching experience. Some successes have been achieved with the use of deep-learning architectures trained for text at both sentence-and document-level [40] or image sentiment analysis [8]. However, the ability to understand emotions from video, to a large extent, remains an unsolved problem. Analysis of emotional content in video has many realworld applications. Video recommendation services can benefit from matching user interests with the emotions of video content and prediction of interestingness [20], [21], [36], leading to improved user satisfaction. Better understanding of video emotions may enable advertising that is consistent with the main video's mood and help avoid social inappropriateness such as placing a funny advertisement alongside a funeral video. Video summarization [68] and coding [60] can also benefit from understanding emotions, since an accurate summary should keep the emotional content conveyed by the original video.


Tropel: Crowdsourcing Detectors with Minimal Training

AAAI Conferences

This paper introduces the Tropel system which enables non-technical users to create arbitrary visual detectors without first annotating a training set. Our primary contribution is a crowd active learning pipeline that is seeded with only a single positive example and an unlabeled set of training images. We examine the crowd's ability to train visual detectors given severely limited training themselves. This paper presents a series of experiments that reveal the relationship between worker training, worker consensus and the average precision of detectors trained by crowd-in-the-loop active learning. In order to verify the efficacy of our system, we train detectors for bird species that work nearly as well as those trained on the exhaustively labeled CUB 200 dataset at significantly lower cost and with little effort from the end user. To further illustrate the usefulness of our pipeline, we demonstrate qualitative results on unlabeled datasets containing fashion images and street-level photographs of Paris.


Modeling Temporal Crowd Work Quality with Limited Supervision

AAAI Conferences

While recent work has shown that a worker’s performance can be more accurately modeled by temporal correlation in task performance, a fundamental challenge remains in the need for expert gold labels to evaluate a worker’s performance. To solve this problem, we explore two methods of utilizing limited gold labels, initial training and periodic updating. Furthermore, we present a novel way of learning a prediction model in the absence of gold labels with uncertaintyaware learning and soft-label updating. Our experiment with a real crowdsourcing dataset demonstrates that periodic updating tends to show better performance than initial training when the number of gold labels are very limited (< 25).


A Review of Relational Machine Learning for Knowledge Graphs

arXiv.org Machine Learning

In this paper, we provide a review of how such statistical models can be "trained" on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two fundamentally different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on latent feature models such as tensor factorization and multiway neural networks. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. To this end, we also discuss Google's Knowledge Vault project as an example of such combination.


IllinoisSL: A JAVA Library for Structured Prediction

arXiv.org Machine Learning

IllinoisSL is a Java library for learning structured prediction models. It supports structured Support Vector Machines and structured Perceptron. The library consists of a core learning module and several applications, which can be executed from command-lines. Documentation is provided to guide users. In Comparison to other structured learning libraries, IllinoisSL is efficient, general, and easy to use.