Unsupervised or Indirectly Supervised Learning
Semi-Supervised Learning with Manifold Fitted Graphs
Zhang, Tongtao (Columbia University) | Ji, Rongrong (Xiamen University) | Liu, Wei (IBM Research) | Tao, Dacheng (University of Technology, Sydney) | Hua, Gang (Stevens Institute of Technology)
In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultaneous neighbor selection and edge weight optimization. The core idea underlying our model is to perform a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the novel neighborhood graph generated by our proposed optimization model M-Fitted Graph since such a graph stems from sparse manifold fitting. To evaluate the robustness and effectiveness ofM -fitted graphs, we leverage graph-based semisupervised learning as the testbed. Extensive experiments carried out on six benchmark datasets validate that the proposed M -fitted graph is superior to state-of-the-art neighborhood graphs in terms of classification accuracy using popular graph-based semi-supervised learning methods.
Adaptive Loss Minimization for Semi-Supervised Elastic Embedding
Nie, Feiping (University of Texas at Arlington) | Wang, Hua (Department of Electrical Engineering and Computer Science) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington)
The semi-supervised learning usually only predict labels for unlabeled data appearing in training data, and cannot effectively predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint is too rigid to capture the manifold structure of data. Motivated by this deficiency, we relax the rigid linear embedding constraint and propose to use an elastic embedding constraint on the predicted label matrix such that the manifold structure can be better explored. To solve our new objective and also a more general optimization problem, we study a novel adaptive loss with efficient optimization algorithm. Our new adaptive loss minimization method takes the advantages of both L1 norm and L2 norm, and is robust to the data outlier under Laplacian distribution and can efficiently learn the normal data under Gaussian distribution. Experiments have been performed on image classification tasks and our approach outperforms other state-of-the-art methods.
Exploring the Contribution of Unlabeled Data in Financial Sentiment Analysis
Ren, Jimmy SJ. (City University of Hong Kong) | Wang, Wei (City University of Hong Kong) | Wang, Jiawei (USTC-CityU Joint Advanced Research Centre) | Liao, Stephen (City University of Hong Kong)
With the proliferation of its applications in various industries, sentiment analysis by using publicly available web data has become an active research area in text classification during these years. It is argued by researchers that semi-supervised learning is an effective approach to this problem since it is capable to mitigate the manual labeling effort which is usually expensive and time-consuming. However, there was a long-term debate on the effectiveness of unlabeled data in text classification. This was partially caused by the fact that many assumptions in theoretic analysis often do not hold in practice. We argue that this problem may be further understood by adding an additional dimension in the experiment. This allows us to address this problem in the perspective of bias and variance in a broader view. We show that the well-known performance degradation issue caused by unlabeled data can be reproduced as a subset of the whole scenario. We argue that if the bias-variance trade-off is to be better balanced by a more effective feature selection method unlabeled data is very likely to boost the classification performance. We then propose a feature selection framework in which labeled and unlabeled training samples are both considered. We discuss its potential in achieving such a balance. Besides, the application in financial sentiment analysis is chosen because it not only exemplifies an important application, the data possesses better illustrative power as well. The implications of this study in text classification and financial sentiment analysis are both discussed.
Privileged Information for Data Clustering
Many machine learning algorithms assume that all input samples are independently and identically distributed from some common distribution on either the input space X, in the case of unsupervised learning, or the input and output space X x Y in the case of supervised and semi-supervised learning. In the last number of years the relaxation of this assumption has been explored and the importance of incorporation of additional information within machine learning algorithms became more apparent. Traditionally such fusion of information was the domain of semi-supervised learning. More recently the inclusion of knowledge from separate hypothetical spaces has been proposed by Vapnik as part of the supervised setting. In this work we are interested in exploring Vapnik's idea of master-class learning and the associated learning using privileged information, however within the unsupervised setting. Adoption of the advanced supervised learning paradigm for the unsupervised setting instigates investigation into the difference between privileged and technical data. By means of our proposed aRi-MAX method stability of the KMeans algorithm is improved and identification of the best clustering solution is achieved on an artificial dataset. Subsequently an information theoretic dot product based algorithm called P-Dot is proposed. This method has the ability to utilize a wide variety of clustering techniques, individually or in combination, while fusing privileged and technical data for improved clustering. Application of the P-Dot method to the task of digit recognition confirms our findings in a real-world scenario.
Fusion with Diffusion for Robust Visual Tracking
Zhou, Yu, Bai, Xiang, Liu, Wenyu, Latecki, Longin J.
A weighted graph is used as an underlying structure of many algorithms like semi-supervised learning and spectral clustering. The edge weights are usually deter-mined by a single similarity measure, but it often hard if not impossible to capture all relevant aspects of similarity when using a single similarity measure. In par-ticular, in the case of visual object matching it is beneficial to integrate different similarity measures that focus on different visual representations. In this paper, a novel approach to integrate multiple similarity measures is pro-posed. First pairs of similarity measures are combined with a diffusion process on their tensor product graph (TPG). Hence the diffused similarity of each pair of ob-jects becomes a function of joint diffusion of the two original similarities, which in turn depends on the neighborhood structure of the TPG. We call this process Fusion with Diffusion (FD). However, a higher order graph like the TPG usually means significant increase in time complexity. This is not the case in the proposed approach. A key feature of our approach is that the time complexity of the dif-fusion on the TPG is the same as the diffusion process on each of the original graphs, Moreover, it is not necessary to explicitly construct the TPG in our frame-work. Finally all diffused pairs of similarity measures are combined as a weighted sum. We demonstrate the advantages of the proposed approach on the task of visual tracking, where different aspects of the appearance similarity between the target object in frame t and target object candidates in frame t+1 are integrated. The obtained method is tested on several challenge video sequences and the experimental results show that it outperforms state-of-the-art tracking methods.
Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach
Luo, Dijun, Huang, Heng, Nie, Feiping, Ding, Chris H.
In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semi-supervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to ``forge'' a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.
The topographic unsupervised learning of natural sounds in the auditory cortex
Terashima, Hiroki, Okada, Masato
The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.
A Convex Relaxation for Weakly Supervised Classifiers
This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as expectation-maximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the soft-max loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semi-supervised learning, as well as on clustering tasks.
On Causal and Anticausal Learning
Schoelkopf, Bernhard, Janzing, Dominik, Peters, Jonas, Sgouritsa, Eleni, Zhang, Kun, Mooij, Joris
We consider the problem of function estimation in the case where an underlying causal model can be inferred. This has implications for popular scenarios such as covariate shift, concept drift, transfer learning and semi-supervised learning. We argue that causal knowledge may facilitate some approaches for a given problem, and rule out others. In particular, we formulate a hypothesis for when semi-supervised learning can help, and corroborate it with empirical results.
Shift-Invariance Sparse Coding for Audio Classification
Grosse, Roger, Raina, Rajat, Kwong, Helen, Ng, Andrew Y.
Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally applied to modeling the human visual cortex, sparse coding has also been shown to be useful for self-taught learning, in which the goal is to solve a supervised classification task given access to additional unlabeled data drawn from different classes than that in the supervised learning problem. Shift-invariant sparse coding (SISC) is an extension of sparse coding which reconstructs a (usually time-series) input using all of the basis functions in all possible shifts. In this paper, we present an efficient algorithm for learning SISC bases. Our method is based on iteratively solving two large convex optimization problems: The first, which computes the linear coefficients, is an L1-regularized linear least squares problem with potentially hundreds of thousands of variables. Existing methods typically use a heuristic to select a small subset of the variables to optimize, but we present a way to efficiently compute the exact solution. The second, which solves for bases, is a constrained linear least squares problem. By optimizing over complex-valued variables in the Fourier domain, we reduce the coupling between the different variables, allowing the problem to be solved efficiently. We show that SISC's learned high-level representations of speech and music provide useful features for classification tasks within those domains. When applied to classification, under certain conditions the learned features outperform state of the art spectral and cepstral features.