Goto

Collaborating Authors

 Inductive Learning


Asymmetric Learning Vector Quantization for Efficient Nearest Neighbor Classification in Dynamic Time Warping Spaces

arXiv.org Machine Learning

The nearest neighbor (NN) classifier endowed with the dynamic time warping (DTW) distance is one of the most popular methods in time series classification [9, 44]. Application examples include electrocardiogram frame classification [16], gesture recognition [2, 32], speech recognition [24], and voice recognition [23]. Two disadvantages of the naive NN method are high storage and computation requirements. Storage requirements are high, because the entire training set needs to be retained for being able to execute its classification rule. Computation requirements are high, because classifying a test example demands calculation of DTW distances between the test and all training examples.


Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning

arXiv.org Machine Learning

We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, link-based text classification and relation extraction. We show modest improvements on well-studied link-based classification benchmarks, and state-of-the-art results on relation-extraction tasks for two realistic domains.


Article 1: Why Machine Learning? โ€“ Apurba Learns ML

#artificialintelligence

For those who aren't familiar with the show, it depicts an all-seeing AI that can predict crime and other immoral acts before they even happen and passes on the information to the Government. One of the episodes of the show illustrates the origin of the Machine, where its creator -- Harold Finch -- is teaching it to distinguish between good and bad by showing it "examples". This is a perfect instance of Supervised learning -- we have a data-set or example set with the right answers given. Afterwards, we expect the computer to predict things based on the data-set. Now, to be a bit more specific, the above scenario depicts "Classification", meaning that the predicted output will fall into one of two or more discrete categories -- "good" and "bad" in our case.


Article 1: Why Machine Learning? โ€“ Apurba Learns ML

#artificialintelligence

For those who aren't familiar with the show, it depicts an all-seeing AI that can predicts crime and other immoral acts before they even happen and passes on the information to the Government. One of the episodes of the show illustrates the origin of the Machine, where its creator -- Harold Finch -- is teaching it to distinguish between good and bad by showing it "examples". This is a perfect example of Supervised learning -- we have a data-set or example set with the right answers given. Afterwards, we expect the computer to predict things based on the data-set. Now, to be a bit more specific, the above scenario depicts "Classification", meaning that the predicted output will fall into one of two or more discrete categories -- "good" and "bad" in our case.


Nonconvex One-bit Single-label Multi-label Learning

arXiv.org Machine Learning

An important topic in the multi-label learning research is how to exploit the relationship between different classes of labels in order to improve the learning accuracy or reduce the number of required labels. When labels are partially observed, the low-rank matrix model is one of the most popular models to deal with missing labels. As human-labeling is usually expensive and time-consuming, it is critical to design a robust algorithm which is able to learn the underlying low-rank matrix model on datasets with noisy heavily missing labels. In this work, we consider an extreme scenario where each training instance only has one single label being annotated in binary set 1 out of multiple classes of labels. This scenario is often encountered in realworld systems but less discussed in literatures. For example, it is rare for a user to annotate a news article or a piece of music with many tags, especially when the user is not paid for his annotation. The problem becomes challenging when we have a large number of features and classes. Over the past decades, a number of multi-label learning approaches have been proposed under different settings.


The Six Steps to Boosted Trees

#artificialintelligence

BigML is bringing Boosted Trees to our ever-growing suite of supervised learning techniques. Boosting is a variation on ensembles that aims to reduce bias, potentially leading to better performance than Bagging or Random Decision Forests. In our first blog post of this series of six posts about Boosted Trees, we saw a gentle introduction to Boosted Trees to get some context about what this new resource is and how it can help you solve your classification and regression problems. This post will take us further, into the detailed steps of how to use boosting with BigML. To learn from our data, we must first upload it.


Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

arXiv.org Machine Learning

The complexity of a learning task is increased by transformations in the input space that preserve class identity. Visual object recognition for example is affected by changes in viewpoint, scale, illumination or planar transformations. While drastically altering the visual appearance, these changes are orthogonal to recognition and should not be reflected in the representation or feature encoding used for learning. We introduce a framework for weakly supervised learning of image embeddings that are robust to transformations and selective to the class distribution, using sets of transforming examples (orbit sets), deep parametrizations and a novel orbit-based loss. The proposed loss combines a discriminative, contrastive part for orbits with a reconstruction error that learns to rectify orbit transformations. The learned embeddings are evaluated in distance metric-based tasks, such as one-shot classification under geometric transformations, as well as face verification and retrieval under more realistic visual variability. Our results suggest that orbit sets, suitably computed or observed, can be used for efficient, weakly-supervised learning of semantically relevant image embeddings. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.


The Crossover Process: Learnability and Data Protection from Inference Attacks

arXiv.org Machine Learning

It is usual to consider data protection and learnability as conflicting objectives. This is not always the case: we show how to jointly control inference --- seen as the attack --- and learnability by a noise-free process that mixes training examples, the Crossover Process (cp). One key point is that the cp~is typically able to alter joint distributions without touching on marginals, nor altering the sufficient statistic for the class. In other words, it saves (and sometimes improves) generalization for supervised learning, but can alter the relationship between covariates --- and therefore fool measures of nonlinear independence and causal inference into misleading ad-hoc conclusions. For example, a cp~can increase / decrease odds ratios, bring fairness or break fairness, tamper with disparate impact, strengthen, weaken or reverse causal directions, change observed statistical measures of dependence. For each of these, we quantify changes brought by a cp, as well as its statistical impact on generalization abilities via a new complexity measure that we call the Rademacher cp~complexity. Experiments on a dozen readily available domains validate the theory.


Belief Propagation in Conditional RBMs for Structured Prediction

arXiv.org Machine Learning

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.


Lipschitz Optimisation for Lipschitz Interpolation

arXiv.org Machine Learning

Supervised machine learning methods are algorithms for inductive inference. On the basis of a sample, they construct (learn) a computable model of a data generating process that facilitates inference over the underlying ground truth function and aims to predict its function values at unobserved inputs. Among supervised learning methods, nonparametric algorithms tend to offer greater flexibility to learn rich function classes. Unfortunately, many classical techniques for nonparametric regression, such as the Nadaraya-Watson estimator [21], [14] or the LOESS method, [6] suffer from a practical limitation: their regression performance depends on the choice of hyperparameters. While in principle, it would be possible to tune these to the data (in manner similar in spirit to the one we propose in this work), to the best of our knowledge, currently there is little understanding on how to do so with a global optimiser that offers theoretical performance guarantees on the optimisation solution. This means that in practice, one is left to engineer these hyperparameters (or the settings of an optimiser) by manual tuning in order to ensure good performance on a particular learning problem. Of course, this stands in opposition to the motivation for utilising nonparametric learning, especially in system identification: which is to facilitate flexible and fully automated black-box learning that does not require manual intervention.