Supervised Learning
Human dominoes record broken
On Thursday, Aaron's Inc., a Maryland-based appliance and electronics company, set a new Guinness World Record for the "largest human mattress dominoes" chain with 1,200 participants taking a total of 13 minutes and 38 seconds to complete the larger than life feat. Event organizers used two exhibit halls covering 70,000 square feet to set up 34 rows of mattresses. The first mattress was pushed over by Aaron's CEO John Robinson. "Breaking a Guinness World Records title has been a great team building event for the associates we have attending our National Managers meeting," said Robinson at the event. The event not only broke a world record but supported a great cause.
Machine Learning Methods: Classification without negative examples โ EFavDB
Here, we discuss some methods for carrying out classification when only positive examples are available. The latter half of our discussion borrows heavily from W.S. Lee and B. Liu, Proc. Follow @efavdb Follow us on twitter for new submission alerts! Logistic regression is a commonly used tool for estimating the level sets of a Boolean function y on a set of feature vectors \textbf{F}: In a sense, you can think of it as a method for playing the game "Battleship" on whatever data set you're interested in. Consider now a situation where all training examples given are positive -- i.e., no negative examples are available.
Fast Metric Learning For Deep Neural Networks
Gouk, Henry, Pfahringer, Bernhard, Cree, Michael
Similarity metrics are a core component of many information retrieval and machine learning systems. In this work we propose a method capable of learning a similarity metric from data equipped with a binary relation. By considering only the similarity constraints, and initially ignoring the features, we are able to learn target vectors for each instance using one of several appropriately designed loss functions. A regression model can then be constructed that maps novel feature vectors to the same target vector space, resulting in a feature extractor that computes vectors for which a predefined metric is a meaningful measure of similarity. We present results on both multiclass and multi-label classification datasets that demonstrate considerably faster convergence, as well as higher accuracy on the majority of the intrinsic evaluation tasks and all extrinsic evaluation tasks.
Maximum margin classifier working in a set of strings
Koyano, Hitoshi, Hayashida, Morihiro, Akutsu, Tatsuya
Numbers and numerical vectors account for a large portion of data. However, recently the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem on the asymptotic behavior of a consensus sequence of strings, which is the counterpart of the mean of numerical vectors, as demonstrated in the probability theory on a metric space of strings developed by one of the authors and his colleague in a previous study [18]. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein--protein interactions using amino acid sequences.
First-order Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces. But unlike convex optimization, geodesically convex (g-convex) optimization is much less developed. In this paper we contribute to the understanding of g-convex optimization by developing iteration complexity analysis for several first-order algorithms on Hadamard manifolds. Specifically, we prove upper bounds for the global complexity of deterministic and stochastic (sub)gradient methods for optimizing smooth and nonsmooth g-convex functions, both with and without strong g-convexity. Our analysis also reveals how the manifold geometry, especially \emph{sectional curvature}, impacts convergence rates. To the best of our knowledge, our work is the first to provide global complexity analysis for first-order algorithms for general g-convex optimization.
Learning Kernels for Structured Prediction using Polynomial Kernel Transformations
Tonde, Chetan, Elgammal, Ahmed
Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be thought of as polynomial combination of input features in a high dimensional reproducing kernel Hilbert space (RKHS). We learn kernels over input and output for structured data, such that, dependency between kernel features is maximized. We use Hilbert-Schmidt Independence Criterion (HSIC) to measure this. We also give an efficient, matrix decomposition-based algorithm to learn these kernel transformations, and demonstrate state-of-the-art results on several real-world datasets.
Learning From Small Samples: An Analysis of Simple Decision Heuristics
ลimลek, รzgรผr, Buckmann, Marcus
Simple decision heuristics are models of human and animal behavior that use few pieces of information--perhaps only a single piece of information--and integrate the pieces in simple ways, for example, by considering them sequentially, one at a time, or by giving them equal weight. We focus on three families of heuristics: single-cue decision making, lexicographic decision making, and tallying. It is unknown how quickly these heuristics can be learned from experience. We show, analytically and empirically, that substantial progress in learning can be made with just a few training samples. When training samples are very few, tallying performs substantially better than the alternative methods tested. Our empirical analysis is the most extensive to date, employing 63 natural data sets on diverse subjects.
Calibrated Structured Prediction
Kuleshov, Volodymyr, Liang, Percy S.
In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.
Predtron: A Family of Online Algorithms for General Prediction Problems
Jain, Prateek, Natarajan, Nagarajan, Tewari, Ambuj
Modern prediction problems arising in multilabel learning and learning to rank pose unique challenges to the classical theory of supervised learning. These problems have large prediction and label spaces of a combinatorial nature and involve sophisticated loss functions. We offer a general framework to derive mistake driven online algorithms and associated loss bounds. The key ingredients in our framework are a general loss function, a general vector space representation of predictions, and a notion of margin with respect to a general norm. Our general algorithm, Predtron, yields the perceptron algorithm and its variants when instantiated on classic problems such as binary classification, multiclass classification, ordinal regression, and multilabel classification. For multilabel ranking and subset ranking, we derive novel algorithms, notions of margins, and loss bounds. A simulation study confirms the behavior predicted by our bounds and demonstrates the flexibility of the design choices in our framework.
Neural Network Matrix Factorization
Dziugaite, Gintare Karolina, Roy, Daniel M.
Data often comes in the form of an array or matrix. Matrix factorization techniques attempt to recover missing or corrupted entries by assuming that the matrix can be written as the product of two low-rank matrices. In other words, matrix factorization approximates the entries of the matrix by a simple, fixed function---namely, the inner product---acting on the latent feature vectors for the corresponding row and column. Here we consider replacing the inner product by an arbitrary function that we learn from the data at the same time as we learn the latent feature vectors. In particular, we replace the inner product by a multi-layer feed-forward neural network, and learn by alternating between optimizing the network for fixed latent features, and optimizing the latent features for a fixed network. The resulting approach---which we call neural network matrix factorization or NNMF, for short---dominates standard low-rank techniques on a suite of benchmark but is dominated by some recent proposals that take advantage of the graph features. Given the vast range of architectures, activation functions, regularizers, and optimization techniques that could be used within the NNMF framework, it seems likely the true potential of the approach has yet to be reached.