The problem of incomplete labels is frequently encountered in many application domains where the training labels are obtained via crowd-sourcing. The label incompleteness significantly increases the difficulty of acquiring accurate multi-label prediction models. In this paper, we propose a novel semi-supervised multi-label method that integrates low-rank label matrix recovery into the manifold regularized vector-valued prediction framework to address multi-label learning with incomplete labels. The proposed method is formulated as a convex but non-smooth joint optimization problem over the latent label matrix and the prediction model parameters. We then develop a fast proximal gradient descent with continuation algorithm to solve it for a global optimal solution. The efficacy of the proposed approach is demonstrated on multiple multi-label datasets, comparing to related methods that handle incomplete labels.
To explore underlying complementary information from multiple views, in this paper, we propose a novel Latent Multi-view Semi-Supervised Classification (LMSSC) method. Unlike most existing multi-view semi-supervised classification methods that learn the graph using original features, our method seeks an underlying latent representation and performs graph learning and label propagation based on the learned latent representation. With the complementarity of multiple views, the latent representation could depict the data more comprehensively than every single view individually, accordingly making the graph more accurate and robust as well. Finally, LMSSC integrates latent representation learning, graph construction, and label propagation into a unified framework, which makes each subtask optimized. Experimental results on real-world benchmark datasets validate the effectiveness of our proposed method.
Few-shot classification (FSC) is challenging due to the scarcity of labeled training data (e.g. Meta-learning has shown to achieve promising results by learning to initialize a classification model for FSC. In this paper we propose a novel semi-supervised meta-learning method called learning to self-train (LST) that leverages unlabeled data and specifically meta-learns how to cherry-pick and label such unsupervised data to further improve performance. To this end, we train the LST model through a large number of semi-supervised few-shot tasks. On each task, we train a few-shot model to predict pseudo labels for unlabeled data, and then iterate the self-training steps on labeled and pseudo-labeled data with each step followed by fine-tuning.
We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.
Practically, we are often in the dilemma that the labeled data at hand are inadequate to train a reliable classifier, and more seriously, some of these labeled data may be mistakenly labeled due to the various human factors. Therefore, this paper proposes a novel semi-supervised learning paradigm that can handle both label insufficiency and label inaccuracy. To address label insufficiency, we use a graph to bridge the data points so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. To address label inaccuracy, Graph Trend Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out the initial noisy labels. GTF penalizes the l_0 norm of label difference between connected examples in the graph and exhibits better local adaptivity than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the correct labels by emphasizing the leading eigenvectors of Laplacian matrix associated with small eigenvalues, as these eigenvectors reflect real label smoothness and carry rich class separation cues. We term our algorithm as `Semi-supervised learning under Inadequate and Incorrect Supervision' (SIIS). Thorough experimental results on image classification, text categorization, and speech recognition demonstrate that our SIIS is effective in label error correction, leading to superior performance to the state-of-the-art methods in the presence of label noise and label scarcity.