Statistical Learning
Topic Correlation Analysis for Cross-Domain Text Classification
Li, Lianghao (Tsinghua University) | Jin, Xiaoming (Tsinghua University) | Long, Mingsheng (Tsinghua University)
Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labeled text data from a related source domain. To this end, the distribution gap between different domains has to be reduced. In previous works, a certain number of shared latent features (e.g., latent topics, principal components, etc.) are extracted to represent documents from different domains, and thus reduce the distribution gap. However, only relying the shared latent features as the domain bridge may limit the amount of knowledge transferred. This limitation is more serious when the distribution gap is so large that only a small number of latent features can be shared between domains. In this paper, we propose a novel approach named Topic Correlation Analysis (TCA), which extracts both the shared and the domain-specific latent features to facilitate effective knowledge transfer. In TCA, all word features are first grouped into the shared and the domain-specific topics using a joint mixture model. Then the correlations between the two kinds of topics are inferred and used to induce a mapping between the domain-specific topics from different domains. Finally, both the shared and the mapped domain-specific topics are utilized to span a new shared feature space where the supervised knowledge can be effectively transferred. The experimental results on two real-world data sets justify the superiority of the proposed method over the stat-of-the-art baselines.
Teaching Machines to Learn by Metaphors
Levy, Omer (Technion - Israel Institute of Technology) | Markovitch, Shaul (Technion - Israel Institute of Technology)
Humans have an uncanny ability to learn new concepts with very few examples. Cognitive theories have suggested that this is done by utilizing prior experience of related tasks. We propose to emulate this process in machines, by transforming new problems into old ones. These transformations are called metaphors. Obviously, the learner is not given a metaphor, but must acquire one through a learning process. We show that learning metaphors yield better results than existing transfer learning methods. Moreover, we argue that metaphors give a qualitative assessment of task relatedness.
Learning the Kernel Matrix with Low-Rank Multiplicative Shaping
Levinboim, Tomer (University of Southern California) | Sha, Fei (University of Southern California)
Selecting the optimal kernel is an important and difficult challenge in applying kernel methods to pattern recognition. To address this challenge, multiple kernel learning (MKL) aims to learn a kernel from a combination of base kernel functions that perform optimally on the task. In this paper, we propose a novel MKL-themed approach to combine base kernels that are multiplicatively shaped with low-rank positive semidefinitve matrices. The proposed approach generalizes several popular MKL methods and thus provides more flexibility in modeling data. Computationally, we show how these low-rank matrices can be learned efficiently from data using convex quadratic programming. Empirical studies on several standard benchmark datasets for MKL show that the new approach often improves prediction accuracy statistically significantly over very competitive single kernel and other MKL methods.
Multi-Label Learning on Tensor Product Graph
Jiang, Jonathan (City University of Hong Kong)
A large family of graph-based semi-supervised algorithms have been developed intuitively and pragmatically for the multi-label learning problem. These methods, however, only implicitly exploited the label correlation, as either part of graph weight or an additional constraint, to improve overall classification performance. Despite their seemingly quite different formulations, we show that all existing approaches can be uniformly referred to as a Label Propagation (LP) or Random Walk with Restart (RWR) on a Cartesian Product Graph (CPG). Inspired by this discovery, we introduce a new framework for multi-label classification task, employing the Tensor Product Graph (TPG) โ the tensor product of the data graph with the class (label) graph โ in which not only the intra-class but also the inter-class associations are explicitly represented as weighted edges among graph vertices. In stead of computing directly on TPG, we derive an iterative algorithm, which is guaranteed to converge and with the same computational complexity and the same amount of storage as the standard label propagation on the original data graph. Applications to four benchmark multi-label data sets illustrate that our method outperforms several state-of-the-art approaches.
Multi-Label Learning by Exploiting Label Correlations Locally
Huang, Sheng-Jun (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
It is well known that exploiting label correlations is important for multi-label learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In real-world tasks, however, different instances may share different label correlations, and few correlations are globally applicable. In this paper, we propose the ML-LOC approach which allows label correlations to be exploited locally. To encode the local influence of label correlations, we derive a LOC code to enhance the feature representation of each instance. The global discrimination fitting and local correlation sensitivity are incorporated into a unified framework, and an alternating solution is developed for the optimization. Experimental results on a number of image, text and gene data sets validate the effectiveness of our approach.
Learning SVM Classifiers with Indefinite Kernels
Gu, Suicheng (Temple University) | Guo, Yuhong (Temple University)
Recently, training support vector machines with indefinite kernels has attracted great attention in the machine learning community. In this paper, we tackle this problem by formulating a joint optimization model over SVM classifications and kernel principal component analysis. We first reformulate the kernel principal component analysis as a general kernel transformation framework, and then incorporate it into the SVM classification to formulate a joint optimization model. The proposed model has the advantage of making consistent kernel transformations over training and test samples. It can be used for both binary classification and multi-class classification problems. Our experimental results on both synthetic data sets and real world data sets show the proposed model can significantly outperform related approaches.
Sparse Principal Component Analysis with Constraints
Grbovic, Mihajlo (Temple University) | Dance, Christopher Roger (Xerox Research Centre Europe) | Vucetic, Slobodan (Temple University)
The sparse principal component analysis is a variant of the classical principal component analysis, which finds linear combinations of a small number of features that maximize variance across data. In this paper we propose a methodology for adding two general types of feature grouping constraints into the original sparse PCA optimization procedure.We derive convex relaxations of the considered constraints, ensuring the convexity of the resulting optimization problem. Empirical evaluation on three real-world problems, one in process monitoring sensor networks and two in social networks, serves to illustrate the usefulness of the proposed methodology.
Classification of Sparse Time Series via Supervised Matrix Factorization
Grabocka, Josif (University of Hildesheim) | Nanopoulos, Alexandros (University of Hildesheim ) | Schmidt-Thieme, Lars (University of Hildesheim)
Data sparsity is an emerging real-world problem observed in a various domains ranging from sensor networks to medical diagnosis. Consecutively, numerous machine learning methods were modeled to treat missing values. Nevertheless, sparsity, defined as missing segments, has not been thoroughly investigated in the context of time series classification. We propose a novel principle for classifying time series, which in contrast to existing approaches, avoids reconstructing the missing segments in time series and operates solely on the observed ones. Based on the proposed principle, we develop a method that prevents adding noise that incurs during the reconstruction of the original time series. Ourmethod adapts supervised matrix factorization by projecting time series in a latent space through stochasticlearning. Furthermore the projected data is built in a supervised fashion via a logistic regression. Abundant experiments on a large collection of 37 data sets demonstrate the superiority of our method, which in the majority of cases outperforms a set of baselines that do not follow our proposed principle.
Efficient Multi-Stage Conjugate Gradient for Trust Region Step
Gong, Pinghua (Tsinghua University) | Zhang, Changshui (Tsinghua University)
The trust region step problem, by solving a sphere constrained quadratic programming, plays a critical role in the trust region Newton method. In this paper, we propose an efficient Multi-Stage Conjugate Gradient (MSCG) algorithm to compute the trust region step in a multi-stage manner. Specifically, when the iterative solution is in the interior of the sphere, we perform the conjugate gradient procedure. Otherwise, we perform a gradient descent procedure which points to the inner of the sphere and can make the next iterative solution be a interior point. Subsequently, we proceed with the conjugate gradient procedure again. We repeat the above procedures until convergence. We also present a theoretical analysis which shows that the MSCG algorithm converges. Moreover, the proposed MSCG algorithm can generate a solution in any prescribed precision controlled by a tolerance parameter which is the only parameter we need. Experimental results on large-scale text data sets demonstrate our proposed MSCG algorithm has a faster convergence speed compared with the state-of-the-art algorithms.
A Spin-Glass Model for Semi-Supervised Community Detection
Eaton, Eric (Bryn Mawr College) | Mansbach, Rachael (University of Illinois at Urbana-Champaign)
Current modularity-based community detection methods show decreased performance as relational networks become increasingly noisy. These methods also yield a large number of diverse community structures as solutions, which is problematic for applications that impose constraints on the acceptable solutions or in cases where the user is focused on specific communities of interest. To address both of these problems, we develop a semi-supervised spin-glass model that enables current community detection methods to incorporate background knowledge in the forms of individual labels and pairwise constraints. Unlike current methods, our approach shows robust performance in the presence of noise in the relational network, and the ability to guide the discovery process toward specific community structures. We evaluate our algorithm on several benchmark networks and a new political sentiment network representing cooperative events between nations that was mined from news articles over six years.