Statistical Learning
Semi-Supervised Classification using Sparse Gaussian Process Regression
Patel, Amrish (Indian Institute of Science) | Sundararajan, S. (Yahoo! Labs) | Shevade, Shirish (Indian Institute of Science)
Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.
Domain Adaptation via Transfer Component Analysis
Pan, Sinno Jialin (Hong Kong University of Science and Technology) | Tsang, Ivor W. (Nanyang Technological University) | Kwok, James T. (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)
Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis ( TCA ), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components , data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components . Furthermore, our approach can handle large datsets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.
Spectral Embedded Clustering
Nie, Feiping (Tsinghua University) | Xu, Dong (Nanyang Technological University) | Tsang, Ivor Wai-Hung (Nanyang Technological University) | Zhang, Changshui (Tsinghua University)
In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, K-means and Discriminative K-means. The experiments on many real-world data sets show that SEC significantly outperforms the existing spectral clustering methods as well as K-means clustering related methods.
Semi-Supervised Learning of Visual Classifiers from Web Images and Text
Morsillo, Nicholas (University of Rochester) | Pal, Christopher (École Polytechnique de Montréal) | Nelson, Randal (University of Rochester)
The web holds tremendous potential as a source of training data for visual classification. However, web images must be correctly indexed and labeled before this potential can be realized. Accordingly, there has been considerable recent interest in collecting imagery from the web using image search engines to build databases for object and scene recognition research. While search engines can provide rough sets of image data, results are noisy and this leads to problems when training classifiers. In this paper we propose a semi-supervised model for automatically collecting clean example imagery from the web. Our approach includes both visual and textual web data in a unified framework. Minimal supervision is enabled by the selective use of generative and discriminative elements in a probabilistic model and a novel learning algorithm. We show through experiments that our model discovers good training images from the web with minimal manual work. Classifiers trained using our method significantly outperform analogous baseline approaches on the Caltech-256 dataset.
Spectral Kernel Learning for Semi-Supervised Classification
Liu, Wei (The Chinese University of Hong Kong) | Qian, Buyue (University of California - Davis) | Cui, Jingyu (Stanford University) | Liu, Jianzhuang (The Chinese University of Hong Kong)
Typical graph-theoretic approaches for semi-supervised classification infer labels of unlabeled instances with the help of graph Laplacians. Founded on the spectral decomposition of the graph Laplacian, this paper learns a kernel matrix via minimizing the leave-one-out classification error on the labeled instances. To this end, an efficient algorithm is presented based on linear programming, resulting in a transductive spectral kernel. The idea of our algorithm stems from regularization methodology and also has a nice interpretation in terms of spectral clustering. A simple classifier can be readily built upon the learned kernel, which suffices to give prediction for any data point aside from those in the available dataset. Besides this usage, the spectral kernel can be effectively used in tandem with conventional kernel machines such as SVMs. We demonstrate the efficacy of the proposed algorithm through experiments carried out on challenging classification tasks.
Learning the Optimal Neighborhood Kernel for Classification
Liu, Jun (Arizona State University) | Chen, Jianhui (Arizona State University) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics) | Ye, Jieping (Arizona State University)
Kernel methods have been applied successfully in many applications. The kernel matrix plays an important role in kernel-based learning methods, but the ideal kernel matrix is usually unknown in practice and needs to be estimated. In this paper, we propose to directly learn the ideal kernel matrix (called the optimal neighborhood kernel matrix) from a pre-specified kernel matrix for improved classification performance. We assume that the pre-specified kernel matrix generated from the specific application is a noisy observation of the ideal one. The resulting optimal neighborhood kernel matrix is shown to be the summation of the pre-specified kernel matrix and a rank-one matrix. We formulate the problem of learning the optimal neighborhood kernel as a constrained quartic problem, and propose to solve it using two methods: level method and constrained gradient descent. Empirical results on several benchmark data sets demonstrate the efficiency and effectiveness of the proposed algorithms.
Exponential Family Sparse Coding with Applications to Self-taught Learning
Lee, Honglak (Stanford University) | Raina, Rajat (Stanford University) | Teichman, Alex (Stanford University) | Ng, Andrew (Stanford University)
Sparse coding is an unsupervised learning algorithm for finding concise, slightly higher-level representations of inputs, and has been successfully applied to self-taught learning, where the goal is to use unlabeled data to help on a supervised learning task, even if the unlabeled data cannot be associated with the labels of the supervised task (Raina et al., 2007). However, sparse coding uses a Gaussian noise model and a quadratic loss function, and thus performs poorly if applied to binary valued, integer valued, or other non-Gaussian data, such as text. Drawing on ideas from generalized linear models (GLMs), we present a generalization of sparse coding to learning with data drawn from any exponential family distribution (such as Bernoulli, Poisson, etc). This gives a method that we argue is much better suited to model other data types than Gaussian. We present an algorithm for solving the L1-regularized optimization problem defined by this model, and show that it is especially efficient when the optimal solution is sparse. We also show that the new model results in significantly improved self-taught learning performance when applied to text classification and to a robotic perception task.
Unsupervised Rank Aggregation with Domain-Specific Expertise
Klementiev, Alexandre (University of Illinois at Urbana-Champaign) | Roth, Dan (University of Illinois at Urbana-Champaign) | Small, Kevin (University of Illinois at Urbana-Champaign) | Titov, Ivan (University of Illinois at Urbana-Champaign)
Consider the setting where a panel of judges is repeatedly asked to (partially) rank sets of objects according to given criteria, and assume that the judges' expertise depends on the objects' domain. Learning to aggregate their rankings with the goal of producing a better joint ranking is a fundamental problem in many areas of Information Retrieval and Natural Language Processing, amongst others. However, supervised ranking data is generally difficult to obtain, especially if coming from multiple domains. Therefore, we propose a framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision. We apply the learning framework to the settings of aggregating full rankings and aggregating top-k lists, demonstrating significant improvements over a domain-agnostic baseline in both cases.
gRegress: Extracting Features from Graph Transactions for Regression
Ketkar, Nikhil S. (Washington State University) | Holder, Lawrence B. (Washington State University) | Cook, Diane J. (Washington State University)
In this work we propose gRegress, a new algorithm which given set of labeled graphs and a real value associated with each graph extracts the complete set of subgraphs such that a) each subgraph in this set has correlation with the real value above a user-specified threshold and b) each subgraph in this set has correlation with any other subgraph in the set below a user-specified threshold. gRegress incorporates novel pruning mechanisms based on correlation of a subgraph feature with the output and correlation with other subgraph features. These pruning mechanisms lead to significant speedup. Experimental results indicate that in terms of runtime, gRegress substantially outperforms gSpan, often by an order of magnitude while the regression models produced by both approaches have comparable accuracy.
Semi-Supervised Classification on Evolutionary Data
Jia, Yangqing (Tsinghua University) | Yan, Shuicheng (National University of Singapore) | Zhang, Changshui (Tsinghua University)
In this paper, we consider semi-supervised classification on evolutionary data, where the distribution of the data and the underlying concept that we aim to learn change over time due to short-term noises and long-term drifting, making a single aggregated classifier inapplicable for long-term classification. The drift is smooth if we take a localized view over the time dimension, which enables us to impose temporal smoothness assumption for the learning algorithm. We first discuss how to carry out such assumption using temporal regularizers defined in a structural way with respect to the Hilbert space, and then derive the online algorithm that efficiently finds the closed-form solution to the classification functions. Experimental results on real-world evolutionary mailing list data demonstrate that our algorithm outperforms classical semi-supervised learning algorithms in both algorithmic stability and classification accuracy.