Inductive Learning
Prototypical Networks for Multi-Label Learning
Yang, Zhuo, Han, Yufei, Yu, Guoxian, Zhang, Xiangliang
We propose to address multi-label learning by jointly estimating the distribution of positive and negative instances for all labels. By a shared mapping function, each label's positive and negative instances are mapped into a new space forming a mixture distribution of two components (positive and negative). Due to the dependency among labels, positive instances are mapped close if they share common labels, while positive and negative embeddings of the same label are pushed away. The distribution is learned in the new space, and thus well presents both the distance between instances in their original feature space and their common membership w.r.t. different categories. By measuring the density function values, new instances mapped to the new space can easily identify their membership to possible multiple categories. We use neural networks for learning the mapping function and use the expectations of the positive and negative embedding as prototypes of the positive and negative components for each label, respectively. Therefore, we name our proposed method PNML (prototypical networks for multi-label learning). Extensive experiments verify that PNML significantly outperforms the state-of-the-arts.
An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms
Jia, Ruoxi, Sun, Xuehui, Xu, Jiacen, Zhang, Ce, Li, Bo, Song, Dawn
This paper focuses on valuating training data for supervised learning tasks and studies the Shapley value, a data value notion originated in cooperative game theory. The Shapley value defines a unique value distribution scheme that satisfies a set of appealing properties desired by a data value notion. However, the Shapley value requires exponential complexity to calculate exactly. Existing approximation algorithms, although achieving great improvement over the exact algorithm, relies on retraining models for multiple times, thus remaining limited when applied to larger-scale learning tasks and real-world datasets. In this work, we develop a simple and efficient heuristic for data valuation based on the Shapley value with complexity independent with the model size. The key idea is to approximate the model via a $K$-nearest neighbor ($K$NN) classifier, which has a locality structure that can lead to efficient Shapley value calculation. We evaluate the utility of the values produced by the $K$NN proxies in various settings, including label noise correction, watermark detection, data summarization, active data acquisition, and domain adaption. Extensive experiments demonstrate that our algorithm achieves at least comparable utility to the values produced by existing algorithms while significant efficiency improvement. Moreover, we theoretically analyze the Shapley value and justify its advantage over the leave-one-out error as a data value measure.
Self-supervised representation learning from electroencephalography signals
Banville, Hubert, Albuquerque, Isabela, Hyvรคrinen, Aapo, Moffat, Graeme, Engemann, Denis-Alexander, Gramfort, Alexandre
The supervised learning paradigm is limited by the cost - and sometimes the impracticality - of data collection and labeling in multiple domains. Self-supervised learning, a paradigm which exploits the structure of unlabeled data to create learning problems that can be solved with standard supervised approaches, has shown great promise as a pretraining or feature learning approach in fields like computer vision and time series processing. In this work, we present self-supervision strategies that can be used to learn informative representations from multivariate time series. One successful approach relies on predicting whether time windows are sampled from the same temporal context or not. As demonstrated on a clinically relevant task (sleep scoring) and with two electroencephalography datasets, our approach outperforms a purely supervised approach in low data regimes, while capturing important physiological information without any access to labels.
Negative sampling in semi-supervised learning
Chen, John, Shah, Vatsal, Kyrillidis, Anastasios
We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL). NS3L is motivated by the success of negative sampling/contrastive estimation. We demonstrate that adding the NS3L loss to state-of-the-art SSL algorithms, such as the Virtual Adversarial Training (VAT), significantly improves upon vanilla VAT and its variant, VAT with Entropy Minimization. By adding the NS3L loss to MixMatch, the current state-of-the-art approach on semi-supervised tasks, we observe significant improvements over vanilla MixMatch. We conduct extensive experiments on the CIFAR10, CIFAR100, SVHN and STL10 benchmark datasets.
Semi-supervised Wrapper Feature Selection with Imperfect Labels
Feofanov, Vasilii, Amini, Massih-Reza, Devijver, Emilie
In this paper, we propose a new wrapper approach for semi-supervised feature selection. A common strategy in semi-supervised learning is to augment the training set by pseudo-labeled unlabeled examples. However, the pseudo-labeling procedure is prone to error and has a high risk of disrupting the learning algorithm with additional noisy labeled training data. To overcome this, we propose to model explicitly the mislabeling error during the learning phase with the overall aim of selecting the most relevant feature characteristics. We derive a $\mathcal{C}$-bound for Bayes classifiers trained over partially labeled training sets by taking into account the mislabeling errors. The risk bound is then considered as an objective function that is minimized over the space of possible feature subsets using a genetic algorithm. In order to produce both sparse and accurate solution, we propose a modification of a genetic algorithm with the crossover based on feature weights and recursive elimination of irrelevant features. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised feature selection approaches.
14 Different Types of Learning in Machine Learning
The use of an environment means that there is no fixed training dataset, rather a goal or set of goals that an agent is required to achieve, actions they may perform, and feedback about performance toward the goal. Some machine learning algorithms do not just experience a fixed dataset. For example, reinforcement learning algorithms interact with an environment, so there is a feedback loop between the learning system and its experiences.
Meta Label Correction for Learning with Weak Supervision
Zheng, Guoqing, Awadallah, Ahmed Hassan, Dumais, Susan
Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. The growing need for large-scale datasets to train deep learning models has increased its importance. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling based on heuristics or user interaction signals. Previous work on modeling and correcting weak labels have been focused on various aspects, including loss correction, training instance re-weighting, etc. In this paper, we approach this problem from a novel perspective based on meta-learning. We view the label correction procedure as a meta-process and propose a new meta-learning based framework termed MLC for learning with weak supervision. Experiments with different label noise levels on multiple datasets show that MLC can achieve large improvement over previous methods incorporating weak labels for learning.
How bad is worst-case data if you know where it comes from?
Chen, Justin Y., Valiant, Gregory, Valiant, Paul
We introduce a framework for studying how distributional assumptions on the process by which data is partitioned into a training and test set can be leveraged to provide accurate estimation or learning algorithms, even for worst-case datasets. We consider a setting of $n$ datapoints, $x_1,\ldots,x_n$, together with a specified distribution, $P$, over partitions of these datapoints into a training set, test set, and irrelevant set. An algorithm takes as input a description of $P$ (or sample access), the indices of the test and training sets, and the datapoints in the training set, and returns a model or estimate that will be evaluated on the datapoints in the test set. We evaluate an algorithm in terms of its worst-case expected performance: the expected performance over potential test/training sets, for worst-case datapoints, $x_1,\ldots,x_n.$ This framework is a departure from more typical distributional assumptions on the datapoints (e.g. that data is drawn independently, or according to an exchangeable process), and can model a number of natural data collection processes, including processes with dependencies such as "snowball sampling" and "chain sampling", and settings where test and training sets satisfy chronological constraints (e.g. the test instances were observed after the training instances). Within this framework, we consider the setting where datapoints are bounded real numbers, and the goal is to estimate the mean of the test set. We give an efficient algorithm that returns a weighted combination of the training set---whose weights depend on the distribution, $P$, and on the training and test set indices---and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $\pi/2$ factor worse than the optimal of such algorithms. The algorithm, and its proof, leverage a surprising connection to the Grothendieck problem.
Semi-Supervised Method using Gaussian Random Fields for Boilerplate Removal in Web Browsers
Boilerplate removal refers to the problem of removing noisy content from a webpage such as ads and extracting relevant content that can be used by various services. This can be useful in several features in web browsers such as ad blocking, accessibility tools such as read out loud, translation, summarization etc. In order to create a training dataset to train a model for boilerplate detection and removal, labeling or tagging webpage data manually can be tedious and time consuming. Hence, a semi-supervised model, in which some of the webpage elements are labeled manually and labels for others are inferred based on some parameters, can be useful. In this paper we present a solution for extraction of relevant content from a webpage that relies on semi-supervised learning using Gaussian Random Fields. We first represent the webpage as a graph, with text elements as nodes and the edge weights representing similarity between nodes. After this, we label a few nodes in the graph using heuristics and label the remaining nodes by a weighted measure of similarity to the already labeled nodes. We describe the system architecture and a few preliminary results on a dataset of webpages.