Goto

Collaborating Authors

 Inductive Learning


Classification with Asymmetric Label Noise: Consistency and Maximal Denoising

arXiv.org Machine Learning

In many real-world classification problems, the labels of training examples are randomly corrupted. Most previous theoretical work on classification with label noise assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. In this work, we give conditions that are necessary and sufficient for the true class-conditional distributions to be identifiable. These conditions are weaker than those analyzed previously, and allow for the classes to be nonseparable and the noise levels to be asymmetric and unknown. The conditions essentially state that a majority of the observed labels are correct and that the true class-conditional distributions are "mutually irreducible," a concept we introduce that limits the similarity of the two distributions. For any label noise problem, there is a unique pair of true class-conditional distributions satisfying the proposed conditions, and we argue that this pair corresponds in a certain sense to maximal denoising of the observed distributions. Our results are facilitated by a connection to "mixture proportion estimation," which is the problem of estimating the maximal proportion of one distribution that is present in another. We establish a novel rate of convergence result for mixture proportion estimation, and apply this to obtain consistency of a discrimination rule based on surrogate loss minimization. Experimental results on benchmark data and a nuclear particle classification problem demonstrate the efficacy of our approach.


Multiple Instance Dictionary Learning using Functions of Multiple Instances

arXiv.org Machine Learning

A multiple instance dictionary learning method using functions of multiple instances (DL-FUMI) is proposed to address target detection and two-class classification problems with inaccurate training labels. Given inaccurate training labels, DL-FUMI learns a set of target dictionary atoms that describe the most distinctive and representative features of the true positive class as well as a set of nontarget dictionary atoms that account for the shared information found in both the positive and negative instances. Experimental results show that the estimated target dictionary atoms found by DL-FUMI are more representative prototypes and identify better discriminative features of the true positive class than existing methods in the literature. DL-FUMI is shown to have significantly better performance on several target detection and classification problems as compared to other multiple instance learning (MIL) dictionary learning algorithms on a variety of MIL problems.


Relational Similarity Machines

arXiv.org Machine Learning

This paper proposes Relational Similarity Machines (RSM): a fast, accurate, and flexible relational learning framework for supervised and semi-supervised learning tasks. Despite the importance of relational learning, most existing methods are hard to adapt to different settings, due to issues with efficiency, scalability, accuracy, and flexibility for handling a wide variety of classification problems, data, constraints, and tasks. For instance, many existing methods perform poorly for multi-class classification problems, graphs that are sparsely labeled or network data with low relational autocorrelation. In contrast, the proposed relational learning framework is designed to be (i) fast for learning and inference at real-time interactive rates, and (ii) flexible for a variety of learning settings (multi-class problems), constraints (few labeled instances), and application domains. The experiments demonstrate the effectiveness of RSM for a variety of tasks and data.


gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity

arXiv.org Machine Learning

When faced with a supervised learning problem, we hope to have rich enough data to build a model that predicts future instances well. However, in practice, problems can exhibit predictive heterogeneity: most instances might be relatively easy to predict, while others might be predictive outliers for which a model trained on the entire dataset does not perform well. Identifying these can help focus future data collection. We present gLOP, the global and Local Penalty, a framework for capturing predictive heterogeneity and identifying predictive outliers. gLOP is based on penalized regression for multitask learning, which improves learning by leveraging training signal information from related tasks. We give two optimization algorithms for gLOP, one space-efficient, and another giving the full regularization path. We also characterize uniqueness in terms of the data and tuning parameters, and present empirical results on synthetic data and on two health research problems.


Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

arXiv.org Artificial Intelligence

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To demonstrate its sensitivity to human pose, we show results for pose estimation on the FLIC and MPII datasets that are competitive, or better than approaches using significantly more supervision. Our method can be combined with supervised representations to provide an additional boost in accuracy.


Interactive Learning from Multiple Noisy Labels

arXiv.org Machine Learning

We consider binary classification problems in the presence of a teacher, who acts as an intermediary to provide a learning algorithm with meaningful, well-chosen examples. This setting is also known as curriculum learning [1, 2, 3] or self-paced learning [4, 5, 6] in the literature. Existing practical methods [4, 7] that employ such a teacher operate by providing the learning algorithm with easy examples first and then progressively moving on to more difficult examples. Such a strategy is known to improve the generalization ability of the learning algorithm and/or alleviate local minima problems while optimizing non-convex objective functions. In this work, we propose a new method to quantify the notion of easiness of a training example.


Untangling AdaBoost-based Cost-Sensitive Classification. Part I: Theoretical Perspective

arXiv.org Artificial Intelligence

Boosting algorithms have been widely used to tackle a plethora of problems. In the last few years, a lot of approaches have been proposed to provide standard AdaBoost with cost-sensitive capabilities, each with a different focus. However, for the researcher, these algorithms shape a tangled set with diffuse differences and properties, lacking a unifying analysis to jointly compare, classify, evaluate and discuss those approaches on a common basis. In this series of two papers we aim to revisit the various proposals, both from theoretical (Part I) and practical (Part II) perspectives, in order to analyze their specific properties and behavior, with the final goal of identifying the algorithm providing the best and soundest results.


Robot takes on the role of scheduling nurse in a busy hospital labor ward

#artificialintelligence

We hear plenty of stories about AI being used in medicine, whether it's discovering new drugs or helping diagnose diseases based on symptoms which may be imperceptible to even expert physicians. One area we've not previously heard about machines working in, however, is in the role of "resource nurse" in a hospital. A nurse in this position in the labor and delivery ward is responsible for making decisions about the rooms patients should be assigned to, or which physician should perform a C-section. That's work that researchers in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) attempted to replicate recently -- with a Nao robot trained to learn how these scheduling choices are made and make similar decisions on its own. "What we were able to show is that a system with only a few dozen training examples from people performing a task very well was able to make decisions which appear to be reasonable," Professor Julie Shah, of of the authors of the study, told Digital Trends.


Minimum Description Length Principle in Supervised Learning with Application to Lasso

arXiv.org Machine Learning

The minimum description length (MDL) principle in supervised learning is studied. One of the most important theories for the MDL principle is Barron and Cover's theory (BC theory), which gives a mathematical justification of the MDL principle. The original BC theory, however, can be applied to supervised learning only approximately and limitedly. Though Barron et al. recently succeeded in removing a similar approximation in case of unsupervised learning, their idea cannot be essentially applied to supervised learning in general. To overcome this issue, an extension of BC theory to supervised learning is proposed. The derived risk bound has several advantages inherited from the original BC theory. First, the risk bound holds for finite sample size. Second, it requires remarkably few assumptions. Third, the risk bound has a form of redundancy of the two-stage code for the MDL procedure. Hence, the proposed extension gives a mathematical justification of the MDL principle to supervised learning like the original BC theory. As an important example of application, new risk and (probabilistic) regret bounds of lasso with random design are derived. The derived risk bound holds for any finite sample size $n$ and feature number $p$ even if $n\ll p$ without boundedness of features in contrast to the past work. Behavior of the regret bound is investigated by numerical simulations. We believe that this is the first extension of BC theory to general supervised learning with random design without approximation.


A Gentle Guide to Machine Learning MonkeyLearn Blog

#artificialintelligence

Machine Learning is a subfield within Artificial Intelligence that builds algorithms that allow computers to learn to perform tasks from data instead of being explicitly programmed. We can make machines learn to do things! The first time I heard that, it blew my mind. That means that we can program computers to learn things by themselves! The ability of learning is one of the most important aspects of intelligence. Translating that power to machines, sounds like a huge step towards making them more intelligent. And in fact, Machine Learning is the area that is making most of the progress in Artificial Intelligence today; being a trendy topic right now and pushing the possibility to have more intelligent machines.