Support Vector Machines
Margin Based PU Learning
Gong, Tieliang (Xi'an Jiaotong University) | Wang, Guangtao (University of Michigan) | Ye, Jieping (University of Michigan) | Xu, Zongben (Xi'an Jiaotong University) | Lin, Ming (University of Michigan)
The PU learning problem concerns about learning from positive and unlabeled data. A popular heuristic is to iteratively enlarge training set based on some margin-based criterion. However, little theoretical analysis has been conducted to support the success of these heuristic methods. In this work, we show that not all margin-based heuristic rules are able to improve the learned classifiers iteratively. We find that a so-called large positive margin oracle is necessary to guarantee the success of PU learning. Under this oracle, a provable positive-margin based PU learning algorithm is proposed for linear regression and classification under the truncated Gaussian distributions. The proposed algorithm is able to reduce the recovering error geometrically proportional to the positive margin. Extensive experiments on real-world datasets verify our theory and the state-of-the-art performance of the proposed PU learning algorithm.
Unsupervised Domain Adaptation With Distribution Matching Machines
Cao, Yue (Tsinghua University) | Long, Mingsheng (Tsinghua University) | Wang, Jianmin (Tsinghua University)
Domain adaptation generalizes a learning model across source domain and target domain that follow different distributions. Most existing work follows a two-step procedure: first, explores either feature matching or instance reweighting independently, and second, train the transfer classifier separately. In this paper, we show that either feature matching or instance reweighting can only reduce, but not remove, the cross-domain discrepancy, and the knowledge hidden in the relations between the data labels from the source and target domains is important for unsupervised domain adaptation. We propose a new Distribution Matching Machine (DMM) based on the structural risk minimization principle, which learns a transfer support vector machine by extracting invariant feature representations and estimating unbiased instance weights that jointly minimize the cross-domain distribution discrepancy. This leads to a robust transfer learner that performs well against both mismatched features and irrelevant instances. Our theoretical analysis proves that the proposed approach further reduces the generalization error bound of related domain adaptation methods. Comprehensive experiments validate that the DMM approach significantly outperforms competitive methods on standard domain adaptation benchmarks.
Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points
Liu, Chenghao (Zhejiang University, China) | Zhang, Teng (Singapore Management University, Singapore) | Zhao, Peilin (Zhejiang University) | Sun, Jianling (Alibaba-Zhejiang University Joint Institute of Frontier Technologies) | Hoi, Steven C. H. (South China University of Technology)
Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, we propose a novel diversified regularization which could capture infrequent patterns and reduce the model size without sacrificing the representation power. Based on this regularization, we develop a joint optimization algorithm among anchor points, local coding coordinates and classifiers to simultaneously minimize the overall classification risk, which is termed as Diversified and Unified Locally Linear Support Vector Machine (DU-LLSVM for short). To the best of our knowledge, DU-LLSVM is the first principled method that directly learns sparse local coding and can be easily generalized to other supervised learning models. Extensive experiments showed that DU-LLSVM consistently surpassed several state-of-the-art methods with a predefined local coding scheme (e.g. LLSVM) or a supervised anchor point learning (e.g. SAPL-LLSVM).
Deception Detection in Videos
Wu, Zhe (University of Maryland College Park) | Singh, Bharat (University of Maryland College Park) | Davis, Larry S. (University of Maryland College Park) | Subrahmanian, V. S. (Dartmouth College)
We present a system for covert automated deception detection using information available in a video. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely used for action recognition, are also very good at predicting deception in videos. We fuse the score of classifiers trained on IDT features and high-level micro-expressions to improve performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio domain also provide a significant boost in performance, while information from transcripts is not very beneficial for our system. Using various classifiers, our automated system obtains an AUC of 0.877 (10-fold cross-validation) when evaluated on subjects which were not part of the training set. Even though state-of-the-art methods use human annotations of micro-expressions for deception detection, our fully automated approach outperforms them by 5%. When combined with human annotations of micro-expressions, our AUC improves to 0.922. We also present results of a user-study to analyze how well do average humans perform on this task, what modalities they use for deception detection and how they perform if only one modality is accessible.
RSDNE: Exploring Relaxed Similarity and Dissimilarity from Completely-Imbalanced Labels for Network Embedding
Wang, Zheng (Tsinghua University) | Ye, Xiaojun (Tsinghua University) | Wang, Chaokun (Tsinghua University) | Wu, Yuexin (Tsinghua University) | Wang, Changping (Tsinghua University) | Liang, Kaiwen (Tsinghua University)
Network embedding, aiming to project a network into a low-dimensional space, is increasingly becoming a focus of network research. Semi-supervised network embedding takes advantage of labeled data, and has shown promising performance. However, existing semi-supervised methods would get unappealing results in the completely-imbalanced label setting where some classes have no labeled nodes at all. To alleviate this, we propose a novel semi-supervised network embedding method, termed Relaxed Similarity and Dissimilarity Network Embedding (RSDNE). Specifically, to benefit from the completely-imbalanced labels, RSDNE guarantees both intra-class similarity and inter-class dissimilarity in an approximate way. Experimental results on several real-world datasets demonstrate the superiority of the proposed method.
Spatial Decompositions for Large Scale SVMs
Thomann, Philipp, Blaschzyk, Ingrid, Meister, Mona, Steinwart, Ingo
Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive, if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical side we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples. All experiments used liquidSVM (Steinwart and Thomann, 2017).
Applying Cooperative Machine Learning to Speed Up the Annotation of Social Signals in Large Multi-modal Corpora
Wagner, Johannes, Baur, Tobias, Zhang, Yue, Valstar, Michel F., Schuller, Bjรถrn, Andrรฉ, Elisabeth
Scientific disciplines, such as Behavioural Psychology, Anthropology and recently Social Signal Processing are concerned with the systematic exploration of human behaviour. A typical work-flow includes the manual annotation (also called coding) of social signals in multi-modal corpora of considerable size. For the involved annotators this defines an exhausting and time-consuming task. In the article at hand we present a novel method and also provide the tools to speed up the coding procedure. To this end, we suggest and evaluate the use of Cooperative Machine Learning (CML) techniques to reduce manual labelling efforts by combining the power of computational capabilities and human intelligence. The proposed CML strategy starts with a small number of labelled instances and concentrates on predicting local parts first. Afterwards, a session-independent classification model is created to finish the remaining parts of the database. Confidence values are computed to guide the manual inspection and correction of the predictions. To bring the proposed approach into application we introduce NOVA - an open-source tool for collaborative and machine-aided annotations. In particular, it gives labellers immediate access to CML strategies and directly provides visual feedback on the results. Our experiments show that the proposed method has the potential to significantly reduce human labelling efforts.
A Game-Theoretic Approach to Design Secure and Resilient Distributed Support Vector Machines
Distributed Support Vector Machines (DSVM) have been developed to solve large-scale classification problems in networked systems with a large number of sensors and control units. However, the systems become more vulnerable as detection and defense are increasingly difficult and expensive. This work aims to develop secure and resilient DSVM algorithms under adversarial environments in which an attacker can manipulate the training data to achieve his objective. We establish a game-theoretic framework to capture the conflicting interests between an adversary and a set of distributed data processing units. The Nash equilibrium of the game allows predicting the outcome of learning algorithms in adversarial environments, and enhancing the resilience of the machine learning through dynamic distributed learning algorithms. We prove that the convergence of the distributed algorithm is guaranteed without assumptions on the training data or network topologies. Numerical experiments are conducted to corroborate the results. We show that network topology plays an important role in the security of DSVM. Networks with fewer nodes and higher average degrees are more secure. Moreover, a balanced network is found to be less vulnerable to attacks.
Predicting Rain in Seattle using SVM Machine Learning Model
I came across this interesting dataset called as SeattleWeather dataset and I decided to use it for my first post on Medium. In this tutorial, I will use this dataset to predict Rain in Seattle using a Linear SVM model. First, lets explore the dataset. In this example, I have not taken the date as a feature to predict the rain and hence eliminated the date column. The other three columns are equally important to predict the rain, precipitation, maximum and minimum temperature.