Goto

Collaborating Authors

 Support Vector Machines


Dataless Text Classification with Descriptive LDA

AAAI Conferences

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.


Pattern-Based Variant-Best-Neighbors Respiratory Motion Prediction Using Orthogonal Polynomials Approximation

AAAI Conferences

Motion-adaptive radiotherapy techniques are promising to deliver truly ablative radiation doses to tumors with minimal normal tissue exposure by accounting for real-time tumor movement. However, a major challenge of successful applications of these techniques is the real-time prediction of breathing-induced tumor motion to accommodate system delivery latencies. Predicting respiratory motion in real-time is challenging. The current respiratory motion prediction approaches are still not satisfactory in terms of accuracy and interpretability due to the complexity of breathing patterns and the high inter-individual variability across patients. In this paper, we propose a novel respiratory motion prediction framework which integrates four key components: a personalized monitoring window generator, an orthogonal polynomial approximation-based pattern library builder, a variant best neighbor pattern searcher, and a statistical prediction decision maker. The four functional components work together into a real-time prediction system and is capable of performing personalized tumor position prediction during radiotherapy. Based on a study of respiratory motion of 27 patients with lung cancer, the proposed prediction approach generated consistently better prediction performances than the current respiratory motion prediction approaches, particularly for long prediction horizons.


On the Impossibility of Convex Inference in Human Computation

AAAI Conferences

Human computation or crowdsourcing involves joint inference of the ground-truth-answers and the worker-abilities by optimizing an objective function, for instance, by maximizing the data likelihood based on an assumed underlying model. A variety of methods have been proposed in the literature to address this inference problem. As far as we know, none of the objective functions in existing methods is convex. In machine learning and applied statistics, a convex function such as the objective function of support vector machines (SVMs) is generally preferred, since it can leverage the high-performance algorithms and rigorous guarantees established in the extensive literature on convex optimization. One may thus wonder if there exists a meaningful convex objective function for the inference problem in human computation. In this paper, we investigate this convexity issue for human computation. We take an axiomatic approach by formulating a set of axioms that impose two mild and natural assumptions on the objective function for the inference. Under these axioms, we show that it is unfortunately impossible to ensure convexity of the inference problem. On the other hand, we show that interestingly, in the absence of a requirement to model "spammers", one can construct reasonable objective functions for crowdsourcing that guarantee convex inference.


Min-Max Kernels

arXiv.org Machine Learning

The min-max kernel is a generalization of the popular resemblance kernel (which is designed for binary data). In this paper, we demonstrate, through an extensive classification study using kernel machines, that the min-max kernel often provides an effective measure of similarity for nonnegative data. As the min-max kernel is nonlinear and might be difficult to be used for industrial applications with massive data, we show that the min-max kernel can be linearized via hashing techniques. This allows practitioners to apply min-max kernel to large-scale applications using well matured linear algorithms such as linear SVM or logistic regression. The previous remarkable work on consistent weighted sampling (CWS) produces samples in the form of ($i^*, t^*$) where the $i^*$ records the location (and in fact also the weights) information analogous to the samples produced by classical minwise hashing on binary data. Because the $t^*$ is theoretically unbounded, it was not immediately clear how to effectively implement CWS for building large-scale linear classifiers. In this paper, we provide a simple solution by discarding $t^*$ (which we refer to as the "0-bit" scheme). Via an extensive empirical study, we show that this 0-bit scheme does not lose essential information. We then apply the "0-bit" CWS for building linear classifiers to approximate min-max kernel classifiers, as extensively validated on a wide range of publicly available classification datasets. We expect this work will generate interests among data mining practitioners who would like to efficiently utilize the nonlinear information of non-binary and nonnegative data.


Identifying Meaningful Citations

AAAI Conferences

We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measure the quality of publications.We model this task as a supervised classification problem at two levels of detail: a coarse one with classes (important vs. non-important), and a more detailed one with four importance classes. We annotate a dataset of approximately 450 citations with this information, and release it publicly. We propose a supervised classification approach that addresses this task with a battery of features that range from citation counts to where the citation appears in the body of the paper, and show that,our approach achieves a precision of 65% for a recall of 90%.


On Heterogeneous Machine Learning Ensembles for Wind Power Prediction

AAAI Conferences

For a sustainable integration of wind power into the electricity grid, a precise prediction method is required. In this work, we investigate the use of heterogeneous machine learning ensembles for wind power prediction. We first analyze homogeneous ensemble regressors that make use of a single base algorithm and compare decision trees to k-nearest neighbors and support vector regression. As next step, we construct heterogeneous ensembles that make use of multiple base algorithms and benefit from a gain of diversity of the weak predictors. In the experimental evaluation, we show that a combination of decision trees and support vector regression outperforms state-of-the-art predictors (improvements of up to 37% compared to support vector regression) as well as homogeneous ensembles while requiring a shorter runtime (speed-ups from 1.60x to 8.78x). The experiments are based on large wind time series data from simulations and real measurements.


Deep Learning using Linear Support Vector Machines

arXiv.org Machine Learning

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.


The Responsibility Weighted Mahalanobis Kernel for Semi-Supervised Training of Support Vector Machines for Classification

arXiv.org Machine Learning

Kernel functions in support vector machines (SVM) are needed to assess the similarity of input samples in order to classify these samples, for instance. Besides standard kernels such as Gaussian (i.e., radial basis function, RBF) or polynomial kernels, there are also specific kernels tailored to consider structure in the data for similarity assessment. In this article, we will capture structure in data by means of probabilistic mixture density models, for example Gaussian mixtures in the case of real-valued input spaces. From the distance measures that are inherently contained in these models, e.g., Mahalanobis distances in the case of Gaussian mixtures, we derive a new kernel, the responsibility weighted Mahalanobis (RWM) kernel. Basically, this kernel emphasizes the influence of model components from which any two samples that are compared are assumed to originate (that is, the "responsible" model components). We will see that this kernel outperforms the RBF kernel and other kernels capturing structure in data (such as the LAP kernel in Laplacian SVM) in many applications where partially labeled data are available, i.e., for semi-supervised training of SVM. Other key advantages are that the RWM kernel can easily be used with standard SVM implementations and training algorithms such as sequential minimal optimization, and heuristics known for the parametrization of RBF kernels in a C-SVM can easily be transferred to this new kernel. Properties of the RWM kernel are demonstrated with 20 benchmark data sets and an increasing percentage of labeled samples in the training data.


Cost-Sensitive Support Vector Machines

arXiv.org Machine Learning

A new procedure for learning cost-sensitive SVM(CS-SVM) classifiers is proposed. The SVM hinge loss is extended to the cost sensitive setting, and the CS-SVM is derived as the minimizer of the associated risk. The extension of the hinge loss draws on recent connections between risk minimization and probability elicitation. These connections are generalized to cost-sensitive classification, in a manner that guarantees consistency with the cost-sensitive Bayes risk, and associated Bayes decision rule. This ensures that optimal decision rules, under the new hinge loss, implement the Bayes-optimal cost-sensitive classification boundary. Minimization of the new hinge loss is shown to be a generalization of the classic SVM optimization problem, and can be solved by identical procedures. The dual problem of CS-SVM is carefully scrutinized by means of regularization theory and sensitivity analysis and the CS-SVM algorithm is substantiated. The proposed algorithm is also extended to cost-sensitive learning with example dependent costs. The minimum cost sensitive risk is proposed as the performance measure and is connected to ROC analysis through vector optimization. The resulting algorithm avoids the shortcomings of previous approaches to cost-sensitive SVM design, and is shown to have superior experimental performance on a large number of cost sensitive and imbalanced datasets.


Domain-Adversarial Neural Networks

arXiv.org Machine Learning

We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate between the training (source) and test (target) domains. We propose a training objective that implements this idea in the context of a neural network, whose hidden layer is trained to be predictive of the classification task, but uninformative as to the domain of the input. Our experiments on a sentiment analysis classification benchmark, where the target domain data available at training time is unlabeled, show that our neural network for domain adaption algorithm has better performance than either a standard neural network or an SVM, even if trained on input features extracted with the state-of-the-art marginalized stacked denoising autoencoders of Chen et al. (2012).