Goto

Collaborating Authors

 Support Vector Machines


Impact of Word Sense Disambiguation on Ordering Dictionary Definitions in Vocabulary Learning Tutors

AAAI Conferences

Past research has shown that dictionaries and glosses can be beneficial in computer assisted language learning, particularly in vocabulary learning. We propose that L2 vocabulary learners can benefit from the use of a dictionary whose definitions are sensitive to the provided reading context, and that advances in the natural language processing task of word sense disambiguation can be used to automatically order the definitions of such a dictionary. An in-vivo study was conducted with ESL students to investigate the effect that the order of definitions has on vocabulary learning using REAP, a computer based vocabulary tutor. Our results showed that students benefited from having the algorithmically determined best definitions listed at the top of the definition list. Furthermore, our results suggest that word sense disambiguation may currently be good enough for use in intelligent language tutoring environments.


Evaluation of Ontology Knowledge in Chinese Classical Poetry Classification

AAAI Conferences

This paper describes preliminary research in the use of ontological knowledge for the task of automatically classifying classical Chinese poetry (CCCP) according to authorship. Based on a collection of poems written by Liu Yong (987–1053 AD) and Su Shi (1037– 1101 AD), which have been analyzed according to a taxonomy of ontological entities at the lexical level, the research looks into the issue of whether characteristic features can be automatically extracted as important stylistic differences between the two poets. This paper examines the efficiency of different ontological concepts as features in CCCP using Support Vector Machine (SVMs). The experiment shows that an integration of ontological knowledge and bags-of-words (BoW) produces a higher precision for CCCP than BoW only with an overall increase of 2.1% and 2.2% in terms of precision and F-score.


Dissimilarity Kernels for Paraphrase Identification

AAAI Conferences

We present in this paper a novel solution to the problem of paraphrase identification based on lexical dissimilarity kernels. Lexical kernels in conjunction with Support Vector Machines are preferred over other learning methods, e.g. decision trees, due to their ability to handle a high number of features. Dissimilarity-based kernels emphasize dissimilarities among text fragments and therefore are appropriate for text similarity tasks characterized by high lexical overlap. We conducted experiments with our kernels on the Microsoft Research (MSR) Paraphrase Corpus, a standardized data set used for assessing approaches to paraphrase identification. Our reported accuracy results are competitive and robust when compared to state-of-the-art single-model approaches. The results were obtained using 10-fold cross-validation over the entire corpus. We also report competitive results on the test portion of the MSR Paraphrase Corpus, which is the standard way to report results on this corpus.


Automatic Detection of User’s Uncertainty in Problem Solving Task: a Multimodal Approach

AAAI Conferences

This paper presents a novel multimodal approach to automatically detect learner’s uncertainty through the integration of multiple sensors. An acquisition protocol was established to record participants’ electrical brain activity and physiological signals while interacting with a problem solving system specifically designed for uncertainty elicitation. Data were collected from 38 subjects using 8 sensors and two video feeds. Results from machine learning classifiers support the feasibility of our approach. 81% of accuracy was reached using Support Vector Machine (SVM) algorithm.


Spectrum Sensing for Cognitive Radio Using Kernel-Based Learning

arXiv.org Machine Learning

Kernel method is a very powerful tool in machine learning. The trick of kernel has been effectively and extensively applied in many areas of machine learning, such as support vector machine (SVM) and kernel principal component analysis (kernel PCA). Kernel trick is to define a kernel function which relies on the inner-product of data in the feature space without knowing these feature space data. In this paper, the kernel trick will be employed to extend the algorithm of spectrum sensing with leading eigenvector under the framework of PCA to a higher dimensional feature space. Namely, the leading eigenvector of the sample covariance matrix in the feature space is used for spectrum sensing without knowing the leading eigenvector explicitly. Spectrum sensing with leading eigenvector under the framework of kernel PCA is proposed with the inner-product as a measure of similarity. A modified kernel GLRT algorithm based on matched subspace model will be the first time applied to spectrum sensing. The experimental results on simulated sinusoidal signal show that spectrum sensing with kernel PCA is about 4 dB better than PCA, besides, kernel GLRT is also better than GLRT. The proposed algorithms are also tested on the measured DTV signal. The simulation results show that kernel methods are 4 dB better than the corresponding linear methods. The leading eigenvector of the sample covariance matrix learned by kernel PCA is more stable than that learned by PCA for different segments of DTV signal.


Asymptotic Normality of Support Vector Machine Variants and Other Regularized Kernel Methods

arXiv.org Machine Learning

In nonparametric classification and regression problems, regularized kernel methods, in particular support vector machines, attract much attention in theoretical and in applied statistics. In an abstract sense, regularized kernel methods (simply called SVMs here) can be seen as regularized M-estimators for a parameter in a (typically infinite dimensional) reproducing kernel Hilbert space. For smooth loss functions, it is shown that the difference between the estimator, i.e.\ the empirical SVM, and the theoretical SVM is asymptotically normal with rate $\sqrt{n}$. That is, the standardized difference converges weakly to a Gaussian process in the reproducing kernel Hilbert space. As common in real applications, the choice of the regularization parameter may depend on the data. The proof is done by an application of the functional delta-method and by showing that the SVM-functional is suitably Hadamard-differentiable.


Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution

Journal of Artificial Intelligence Research

Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.


Differentially Private Empirical Risk Minimization

arXiv.org Artificial Intelligence

Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $\epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.


Learning with Support Vector Machines

Morgan & Claypool Publishers

Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such as prediction with real-valued outputs, novelty detection and the handling of complex output structures such as parse trees.


Training linear ranking SVMs in linearithmic time using red-black trees

arXiv.org Machine Learning

Learning to rank has been a task of significant interest during the recent years. The ranking problem has been largely motivated by applications in areas such as web search and recommender systems. Due to the large amounts of data available in these domains, it is necessary for the used algorithms to scale well, preferably close to linear time methods are needed. For a detailed introduction to the topic of learning to rank, we refer to (Liu, 2009; Fürnkranz and Hüllermeier, 2011). In this work we assume the so-called scoring setting, where each data instance is associated with a utility score reflecting its goodness with respect to the ranking criterion. A successful approach for learning ranking functions has been to consider pairwise preferences (Fürnkranz and Hüllermeier, 2005). In this setting, the aim is to minimize the number of pairwise mis-orderings in the ranking produced when ordering a set of examples according to predicted utility scores. A number of machine learning algorithms optimizing relaxations of this criterion have been proposed, such as the RankBoost (Freund et al., 2003), RankNet (Burges et al., 2005), RankRLS (Pahikkala et al., 2007, 2009), and the subject of this study, the ranking support vector machine (RankSVM) algorithm (Herbrich et al., 1999; Joachims, 2002). The original solution proposed for RankSVM optimization was to train a support vector machine (SVM) classifier on pairs of data examples.