Statistical Learning
Learning with Augmented Class by Exploiting Unlabeled Data
Da, Qing (Nanjing University) | Yu, Yang (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
In many real-world applications of learning, the environment is open and changes gradually, which requires the learning system to have the ability of detecting and adapting to the changes. Class-incremental learning (C-IL) is an important and practical problem where data from unseen augmented classes are fed, but has not been studied well in the past. In C-IL, the system should beware of predicting instances from augmented classes as a seen class, and thus faces the challenge that no such instances were observed during training stage. In this paper, we tackle the challenge by using unlabeled data, which can be cheaply collected in many real-world applications. We propose the LACU framework as well as the LACU-SVM approach to learn the concept of seen classes while incorporating the structure presented in the unlabeled data, so that the misclassification risks among the seen classes as well as between the augmented and the seen classes are minimized simultaneously. Experiments on diverse datasets show the effectiveness of the proposed approach.
Dropout Training for Support Vector Machines
Chen, Ning (Tsinghua University) | Zhu, Jun (Tsinghua University) | Chen, Jianfei (Tsinghua University) | Zhang, Bo (Tsinghua University)
Dropout and other feature noising schemes have shown promising results in controlling over-fitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for linear SVMs. To deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re-weighted least square problem, where the re-weights have closed-form solutions. The similar ideas are applied to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of linear SVMs.
A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation
Chen, Dongdong (Sichuan University) | Lv, Jian Cheng (Sichuan University) | Yi, Zhang (Sichuan University)
The local neighborhood selection plays a crucial role for most representation based manifold learning algorithms. This paper reveals that an improper selection of neighborhood for learning representation will introduce negative components in the learnt representations. Importantly, the representations with negative components will affect the intrinsic manifold structure preservation. In this paper, a local non-negative pursuit (LNP) method is proposed for neighborhood selection and non-negative representations are learnt. Moreover, it is proved that the learnt representations are sparse and convex. Theoretical analysis and experimental results show that the proposed method achieves or outperforms the state-of-the-art results on various manifold learning problems.
Dynamic Bayesian Probabilistic Matrix Factorization
Chatzis, Sotirios (Cyprus University of Technology)
Collaborative filtering algorithms generally rely on the assumption that user preference patterns remain stationary. However, real-world relational data are seldom stationary. User preference patterns may change over time, giving rise to the requirement of designing collaborative filtering systems capable of detecting and adapting to preference pattern shifts. Motivated by this observation, in this paper we propose a dynamic Bayesian probabilistic matrix factorization model, designed for modeling time-varying distributions. Formulation of our model is based on imposition of a dynamic hierarchical Dirichlet process (dHDP) prior over the space of probabilistic matrix factorization models to capture the time-evolving statistical properties of modeled sequential relational datasets. We develop a simple Markov Chain Monte Carlo sampler to perform inference. We present experimental results to demonstrate the superiority of our temporal model.
LASS: A Simple Assignment Model with Laplacian Smoothing
Carreira-Perpinan, Miguel Angel (University of California, Merced) | Wang, Weiran (TTI Chicago)
We consider the problem of learning soft assignments of N items to K categories given two sources of information: an item-category similarity matrix, which encourages items to be assigned to categories they are similar to (and to not be assigned to categories they are dissimilar to), and an item-item similarity matrix, which encourages similar items to have similar assignments. We propose a simple quadratic programming model that captures this intuition. We give necessary conditions for its solution to be unique, define an out-of-sample mapping, and derive a simple, effective training algorithm based on the alternating direction method of multipliers. The model predicts reasonable assignments from even a few similarity values, and can be seen as a generalization of semisupervised learning. It is particularly useful when items naturally belong to multiple categories, as for example when annotating documents with keywords or pictures with tags, with partially tagged items, or when the categories have complex interrelations (e.g. hierarchical) that are unknown.
Optimal Neighborhood Preserving Visualization by Maximum Satisfiability
Bunte, Kerstin (Helsinki Institute for Information Technology HIIT and Aalto University) | Järvisalo, Matti (Helsinki Institute for Information Technology HIIT and University of Helsinki) | Berg, Jeremias (Helsinki Institute for Information Technology HIIT and University of Helsinki) | Myllymäki, Petri (Helsinki Institute for Information Technology HIIT and University of Helsinki) | Peltonen, Jaakko (Helsinki Institute for Information Technology HIIT and Aalto University and University of Tampere) | Kaski, Samuel (Helsinki Institute for Information Technology HIIT and Aalto University and University of Helsinki)
We present a novel approach to low-dimensional neighbor embedding for visualization, based on formulating an information retrieval based neighborhood preservation cost function as Maximum satisfiability on a discretized output display. The method has a rigorous interpretation as optimal visualization based on the cost function. Unlike previous low-dimensional neighbor embedding methods, our formulation is guaranteed to yield globally optimal visualizations, and does so reasonably fast. Unlike previous manifold learning methods yielding global optima of their cost functions, our cost function and method are designed for low-dimensional visualization where evaluation and minimization of visualization errors are crucial. Our method performs well in experiments, yielding clean embeddings of datasets where a state-of-the-art comparison method yields poor arrangements. In a real-world case study for semi-supervised WLAN signal mapping in buildings we outperform state-of-the-art methods.
Active Learning with Model Selection
Ali, Alnur (Carnegie Mellon University) | Caruana, Rich (Microsoft Research) | Kapoor, Ashish (Microsoft Research)
Most active learning methods avoid model selection by training models of one type (SVMs, boosted trees, etc.) using one pre-defined set of model hyperparameters. We propose an algorithm that actively samples data to simultaneously train a set of candidate models (different model types and/or different hyperparameters) and also select the best model from this set. The algorithm actively samples points for training that are most likely to improve the accuracy of the more promising candidate models, and also samples points for model selection---all samples count against the same labeling budget. This exposes a natural trade-off between the focused active sampling that is most effective for training models, and the unbiased sampling that is better for model selection. We empirically demonstrate on six test problems that this algorithm is nearly as effective as an active learning oracle that knows the optimal model in advance.
Supervised Transfer Sparse Coding
Al-Shedivat, Maruan (King Abdullah University of Science and Technology) | Wang, Jim Jing-Yan (University at Buffalo, The State University of New York) | Alzahrani, Majed (King Abdullah University of Science and Technology) | Huang, Jianhua Z. (Texas A&M University) | Gao, Xin (King Abdullah University of Science and Technology)
A combination of the sparse coding and transfer learning techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from different underlying distributions, i.e., belong to different domains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small number of them. In this paper, we explore such possibility and show how a small number of labeled data in the target domain can significantly leverage classification accuracy of the state-of-the-art transfer sparse coding methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.
Semi-Supervised Matrix Completion for Cross-Lingual Text Classification
Xiao, Min (Temple University) | Guo, Yuhong (Temple University)
Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target language domain. In this work, we propose a novel semi-supervised representation learning approach to address this challenging task by inducing interlingual features via semi-supervised matrix completion. To evaluate the proposed learning technique, we conduct extensive experiments on eighteen cross language sentiment classification tasks with four different languages. The empirical results demonstrate the efficacy of the proposed approach, and show it outperforms a number of related cross-lingual learning methods.
Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation
Xia, Rui (Nanjing University of Science and Technology) | Yu, Jianfei (Nanjing University of Science and Technology) | Xu, Feng (Nanjing University of Science and Technology) | Wang, Shumei (Nanjing University of Science and Technology)
In the field of NLP, most of the existing domain adaptation studies belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. In this work, we propose a new instance-based adaptation model, called in-target-domain logistic approximation (ILA). In ILA, we adapt the source-domain data to the target domain by a logistic approximation. The normalized in-target-domain probability is assigned as an instance weight to each of the source-domain training data. An instance-weighted classification model is trained finally for the cross-domain classification problem. Compared to the previous techniques, ILA conducts instance adaptation in a dimensionality-reduced linear feature space to ensure efficiency in high-dimensional NLP tasks. The instance weights in ILA are learnt by leveraging the criteria of both maximum likelihood and minimum statistical distance. The empirical results on two NLP tasks including text categorization and sentiment classification show that our ILA model beats the state-of-the-art instance adaptation methods significantly, in cross-domain classification accuracy, parameter stability and computational efficiency.