Inductive Learning
Transfer Learning in a Transductive Setting
Rohrbach, Marcus, Ebert, Sandra, Schiele, Bernt
Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels however is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three main ingredients. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expert-specified information, e.g., by a mid-level layer of semantic attributes. Second, we exploit the manifold structure of novel classes. More specifically we adapt a graph-based learning algorithm - so far only used for semi-supervised learning - to zero-shot and few-shot learning. Third, we improve the local neighborhood in such graph structures by replacing the raw feature-based representation with a mid-level object- or attribute-based representation. We evaluate our approach on three challenging datasets in two different applications, namely on Animals with Attributes and ImageNet for image classification and on MPII Composites for activity recognition. Our approach consistently outperforms state-of-the-art transfer and semi-supervised approaches on all datasets.
One-Class Classification: Taxonomy of Study and Review of Techniques
Khan, Shehroz S., Madden, Michael G.
One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.
Are all training examples equally valuable?
Lapedriza, Agata, Pirsiavash, Hamed, Bylinskii, Zoya, Torralba, Antonio
When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may negatively impact performance and removing them may be beneficial. In this paper, we propose an approach for measuring the training value of an example, and use it for ranking and greedily sorting examples. We test our methods on different vision tasks, models, datasets and classifiers. Our experiments show that the performance of current state-of-the-art detectors and classifiers can be improved when training on a subset, rather than the whole training set.
Soft Rule Ensembles for Supervised Learning
Akdemir, Deniz (Cornell University) | Heslot, Nicolas (Cornell University) | Jannink, Jean-Luc (Limagrain, Europe)
In this article supervised learning problems are solved using soft rule ensembles. First, we review the importance sampling learning ensembles (ISLE) approach that is useful for generating hard rules. Soft rules are obtained with logistic regression using the corresponding hard rules and training data. Soft rule ensembles work well when both the response and the input variables are continuous because soft rules provide smooth transitions around the boundaries of hard rules. Finally, various examples and simulation results are provided to illustrate and evaluate the performance of soft rule ensembles.
Personalized Human Computation
Organisciak, Peter (University of Illinois at Urbana-Champaign) | Teevan, Jaime (Microsoft Research) | Dumais, Susan (Microsoft Research) | Miller, Robert C. (MIT CSAIL) | Kalai, Adam Tauman (Microsoft Research)
Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results. Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalยญized results on demand, over personal data, and for complex tasks. This work-in-progress compares two approaches to personalยญized human computation. In both, users annotate a small set of training examples which are then used by the crowd to annotate unseen items. In the first approach, which we call taste-matching, crowd members are asked to annotate the same set of training examples, and the ratings of similar users on other items are then used to infer personalยญized ratings. In the second approach, taste-grokking, the crowd is presented with the training examples and asked to use them predict the ratings of the target user on other items.
Correlated random features for fast semi-supervised learning
McWilliams, Brian, Balduzzi, David, Buhmann, Joachim M.
This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It has been shown that, if the views contains accurate estimators, CCA regression can substantially reduce variance with a minimal increase in bias. Random views are justified by recent theoretical and empirical work showing that regression with random features closely approximates kernel regression, implying that random views can be expected to contain accurate estimators. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude.
Para-active learning
Agarwal, Alekh, Bottou, Leon, Dudik, Miroslav, Langford, John
Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show that its performance does not deteriorate when the sifting process relies on a slightly outdated model. Parallel active learning is particularly attractive to train nonlinear models with non-linear representations because there are few practical parallel learning algorithms for such models. We report preliminary experiments using both kernel SVMs and SGD-trained neural networks.
Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning
Wang, Mengqiu, Manning, Christopher D.
We consider a multilingual weakly supervised learning scenario where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide the learning in other languages. Past approaches project labels across bitext and use them as features or gold labels for training. We propose a new method that projects model expectations rather than labels, which facilities transfer of model uncertainty across language boundaries. We encode expectations as constraints and train a discriminative CRF model using Generalized Expectation Criteria (Mann and McCallum, 2010). Evaluated on standard Chinese-English and German-English NER datasets, our method demonstrates F1 scores of 64% and 60% when no labeled data is used. Attaining the same accuracy with supervised CRFs requires 12k and 1.5k labeled sentences. Furthermore, when combined with labeled examples, our method yields significant improvements over state-of-the-art supervised methods, achieving best reported numbers to date on Chinese OntoNotes and German CoNLL-03 datasets.
Generative Multiple-Instance Learning Models For Quantitative Electromyography
Adel, Tameem, Smith, Benn, Urner, Ruth, Stashuk, Daniel, Lizotte, Daniel J.
We present a comprehensive study of the use of generative modeling approaches for Multiple-Instance Learning (MIL) problems. In MIL a learner receives training instances grouped together into bags with labels for the bags only (which might not be correct for the comprised instances). Our work was motivated by the task of facilitating the diagnosis of neuromuscular disorders using sets of motor unit potential trains (MUPTs) detected within a muscle which can be cast as a MIL problem. Our approach leads to a state-of-the-art solution to the problem of muscle classification. By introducing and analyzing generative models for MIL in a general framework and examining a variety of model structures and components, our work also serves as a methodological guide to modelling MIL tasks. We evaluate our proposed methods both on MUPT datasets and on the MUSK1 dataset, one of the most widely used benchmarks for MIL.
Active Learning with Expert Advice
Zhao, Peilin, Hoi, Steven, Zhuang, Jinfeng
Conventional learning with expert advice methods assumes a learner is always receiving the outcome (e.g., class labels) of every incoming training instance at the end of each trial. In real applications, acquiring the outcome from oracle can be costly or time consuming. In this paper, we address a new problem of active learning with expert advice, where the outcome of an instance is disclosed only when it is requested by the online learner. Our goal is to learn an accurate prediction model by asking the oracle the number of questions as small as possible. To address this challenge, we propose a framework of active forecasters for online active learning with expert advice, which attempts to extend two regular forecasters, i.e., Exponentially Weighted Average Forecaster and Greedy Forecaster, to tackle the task of active learning with expert advice. We prove that the proposed algorithms satisfy the Hannan consistency under some proper assumptions, and validate the efficacy of our technique by an extensive set of experiments.