We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an $\Omega(K\log K)$ labeled sample complexity bound without imposing parametric assumptions, where $K$ is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ($K>2$), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier. Finally, we describe three algorithms for computing these estimators based on a connection to bipartite graph matching, and perform experiments to illustrate the superiority of the MLE over the majority vote estimator.
Transfer learning is a very important tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate in various transfer learning experiments the effectiveness of the proposed approach.
This paper addresses the general problem of accurate identification of oil reservoirs. Recent improvements in well or borehole logging technology have resulted in an explosive amount of data available for processing. The traditional methods of analysis of the logs characteristics by experts require significant amount of time and money and is no longer practicable. In this paper, we use the semi-supervised learning to solve the problem of ever-increasing amount of unlabelled data available for interpretation. The experts are needed to label only a small amount of the log data. The neural network classifier is first trained with the initial labelled data. Next, batches of unlabelled data are being classified and the samples with the very high class probabilities are being used in the next training session, bootstrapping the classifier. The process of training, classifying, enhancing the labelled data is repeated iteratively until the stopping criteria are met, that is, no more high probability samples are found. We make an empirical study on the well data from Jianghan oil field and test the performance of the neural network semi-supervised classifier. We compare this method with other classifiers. The comparison results show that our neural network semi-supervised classifier is superior to other classification methods.
Semi-supervised learning methods are motivated by the relative paucity of labeled data and aim to utilize large sources of unlabeled data to improve predictive tasks. It has been noted, however, such improvements are not guaranteed in general in some cases the unlabeled data impairs the performance. A fundamental source of error comes from restrictive assumptions about the unlabeled features. In this paper, we develop a semi-supervised learning approach that relaxes such assumptions and is robust with respect to labels missing at random. The approach ensures that uncertainty about the classes is propagated to the unlabeled features in a robust manner. It is applicable using any generative model with associated learning algorithm. We illustrate the approach using both standard synthetic data examples and the MNIST data with unlabeled adversarial examples.
We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between the success of these methods and the existence of label-efficient learning procedures. We show that in both the realizable and agnostic cases, if output codes are successful at learning from labeled data, they implicitly assume structure on how the classes are related. By making that structure explicit, we design learning algorithms to recover the classes with low label complexity. We provide results for the commonly studied cases of one-vs-all learning and when the codewords of the classes are well separated. We additionally consider the more challenging case where the codewords are not well separated, but satisfy a boundary features condition that captures the natural intuition that every bit of the codewords should be significant.