Goto

Collaborating Authors

 Support Vector Machines


What is a Support Vector Machine, and Why Would I Use it?

#artificialintelligence

This post originally appeared on the Yhat blog. Yhat is a Brooklyn based company whose goal is to make data science applicable for developers, data scientists, and businesses alike. Yhat provides a software platform for deploying and managing predictive algorithms as REST APIs, while eliminating the painful engineering obstacles associated with production environments like testing, versioning, scaling and security. SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.


Learn Support Vector Machine (SVM) from Scratch in R

@machinelearnbot

Imagine a case - if there is no straight line (or hyperplane) which can separate two classes? In the image shown below, there is a circle in 2D with red and blue data points all over it such that adjacent data points are of different colors. SVM handles the above case by using a kernel function to handle non-linear separable data. It is explained in the next section.


liquidSVM: A Fast and Versatile SVM package

arXiv.org Machine Learning

liquidSVM is a package written in C++ that provides SVM-type solvers for various classification and regression tasks. Because of a fully integrated hyper-parameter selection, very carefully implemented solvers, multi-threading and GPU support, and several built-in data decomposition strategies it provides unprecedented speed for small training sizes as well as for data sets of tens of millions of samples. Besides the C++ API and a command line interface, bindings to R, MATLAB, Java, Python, and Spark are available. We present a brief description of the package and report experimental comparisons to other SVM packages.


dlib C Library - Machine Learning

#artificialintelligence

It learns the parameter vector by formulating the problem as a structural SVM problem. The exact details of the method are described in the paper Max-Margin Object Detection by Davis E. King.


SCOPE: Scalable Composite Optimization for Learning on Spark

AAAI Conferences

Many machine learning models, such as logistic regression (LR) and support vector machine (SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization (DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods might not be scalable enough. In this paper, we propose a novel DSO method, called scalable composite optimization for learning (SCOPE), and implement it on the fault-tolerant distributed platform Spark. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is strongly convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.


Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding

AAAI Conferences

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.


Sarcasm Suite: A Browser-Based Engine for Sarcasm Detection and Generation

AAAI Conferences

Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection modules use four kinds of incongruity: sentiment incongruity, semantic incongruity, historical context incongruity and conversational context incongruity. The sarcasm generation module is a chatbot that responds sarcastically to user input. With a visually appealing interface that indicates predictions using `faces' of our co-authors from our past papers, Sarcasm Suite is our first demonstration of our work in computational sarcasm.


ATSUM: Extracting Attractive Summaries for News Propagation on Microblogs

AAAI Conferences

In this paper, we investigate how to automatically extract attractive summaries for news propagation on microblogs and propose a novel system called ATSUM to achieve this goal via text attractiveness analysis. It first analyzes the sentences in a news article and automatically predict the attractiveness score of each sentence by using the support vector regression method. The predicted attractiveness scores are then incorporated into a summarization system. Experimental results on a manually labeled dataset verify the effectiveness of the proposed methods.


Wikitop: Using Wikipedia Category Network to Generate Topic Trees

AAAI Conferences

Automated topic identification is an essential component invarious information retrieval and knowledge representationtasks such as automated summary generation, categorization search and document indexing. In this paper, we present the Wikitop system to automatically generate topic trees from the input text by performing hierarchical classification using the Wikipedia Category Network (WCN). Our preliminary results over a collection of 125 articles are encouraging and show potential of a robust methodology for automated topic tree generation.


Healthy Cognitive Aging: A Hybrid Random Vector Functional-Link Model for the Analysis of Alzheimer’s Disease

AAAI Conferences

Alzheimer's disease (AD) is a genetically complex neurodegenerative disease, which leads to irreversible brain damage, severe cognitive problems and ultimately death. A number of clinical trials and study initiatives have been set up to investigate AD pathology, leading to large amounts of high dimensional heterogeneous data (biomarkers) for analysis. This paper focuses on combining clinical features from different modalities, including medical imaging, cerebrospinal fluid (CSF), etc., to diagnose AD and predict potential progression. Due to privacy and legal issues involved with clinical research, the study cohort (number of patients) is relatively small, compared to thousands of available biomarkers (predictors). We propose a hybrid pathological analysis model, which integrates manifold learning and Random Vector functional-link network (RVFL) so as to achieve better ability to extract discriminant information with limited training materials. Furthermore, we model (current and future) cognitive healthiness as a regression problem about age. By comparing the difference between predicted age and actual age, we manage to show statistical differences between different pathological stages. Verification tests are conducted based on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Extensive comparison is made against different machine learning algorithms, i.e. Support Vector Machine (SVM), Random Forest (RF), Decision Tree and Multilayer Perceptron (MLP). Experimental results show that our proposed algorithm achieves better results than the comparison targets, which indicates promising robustness for practical clinical implementation.