lapsvm
Large-Scale Sparsified Manifold Regularization
Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities.
Semi-Automatic Data Annotation guided by Feature Space Projection
Benato, Barbara Caroline, Gomes, Jancarlo Ferreira, Telea, Alexandru Cristian, Falcão, Alexandre Xavier
Data annotation using visual inspection (supervision) of each training sample can be laborious. Interactive solutions alleviate this by helping experts propagate labels from a few supervised samples to unlabeled ones based solely on the visual analysis of their feature space projection (with no further sample supervision). We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities, a large and diverse dataset that makes classification very hard. We evaluate two approaches for semi-supervised learning from the latent and projection spaces, to choose the one that best reduces user annotation effort and also increases classification accuracy on unseen data. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.
Efficient Cross-Validation for Semi-Supervised Learning
Liu, Yong, Li, Jian, Wu, Guangjun, Ding, Lizhong, Wang, Weiping
Manifold regularization, such as laplacian regularized least squares (LapRLS) and laplacian support vector machine (LapSVM), has been widely used in semi-supervised learning, and its performance greatly depends on the choice of some hyper-parameters. Cross-validation (CV) is the most popular approach for selecting the optimal hyper-parameters, but it has high complexity due to multiple times of learner training. In this paper, we provide a method to approximate the CV for manifold regularization based on a notion of robust statistics, called Bouligand influence function (BIF). We first provide a strategy for approximating the CV via the Taylor expansion of BIF. Then, we show how to calculate the BIF for general loss function,and further give the approximate CV criteria for model selection in manifold regularization. The proposed approximate CV for manifold regularization requires training only once, hence can significantly improve the efficiency of traditional CV. Experimental results show that our approximate CV has no statistical discrepancy with the original one, but much smaller time cost.
CURL: Co-trained Unsupervised Representation Learning for Image Classification
Bianco, Simone, Ciocca, Gianluigi, Cusano, Claudio
Abstract--In this paper we propose a strategy for semi-supervised image classification that leverages unsupervised representation learning and co-training. The strategy, that is called CURL from Co-trained Unsupervised Representation Learning, iteratively builds two classifiers on two different views of the data. The two views correspond to different representations learned from both labeled and unlabeled data and differ in the fusion scheme used to combine the image features. T o assess the performance of our proposal, we conducted several experiments on widely used data sets for scene and object recognition. We considered three scenarios (inductive, transductive and self-taught learning) that differ in the strategy followed to exploit the unlabeled data. As image features we considered a combination of GIST, PHOG, and LBP as well as features extracted from a Con-volutional Neural Network. Moreover, two embodiments of CURL are investigated: one using Ensemble Projection as unsupervised representation learning coupled with Logistic Regression, and one based on LapSVM. The results show that CURL clearly outperforms other supervised and semi-supervised learning methods in the state of the art. Semi-supervised learning [1] consists in taking into account both labeled and unlabeled data when training machine learning models. It is particularly effective when there is plenty of training data, but only a few instances are labeled. In the last years, many semi-supervised learning approaches have been proposed including generative methods [2], [3], graph-based methods [4], [5], and methods based on Support V ector Machines [6], [7]. Co-training is another example of semi-supervised technique [8].
Laplacian Support Vector Machines Trained in the Primal
Melacci, Stefano, Belkin, Mikhail
In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.
Large-Scale Sparsified Manifold Regularization
Tsang, Ivor W., Kwok, James T.
Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.
Large-Scale Sparsified Manifold Regularization
Tsang, Ivor W., Kwok, James T.
Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.
Large-Scale Sparsified Manifold Regularization
Tsang, Ivor W., Kwok, James T.
Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involveskernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including theLapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.