AITopics | Support Vector Machines

Collaborating Authors

Support Vector Machines

Support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Finding Better Active Learners for Faster Literature Reviews

Yu, Zhe, Kraft, Nicholas A., Menzies, Tim

arXiv.org Artificial IntelligenceFeb-2-2018

Literature reviews can be time-consuming and tedious to complete. By cataloging and refactoring three state-of-the-art active learning techniques from evidence-based medicine and legal electronic discovery, this paper finds and implements FASTREAD, a faster technique for studying a large corpus of documents. This paper assesses FASTREAD using datasets generated from existing SE literature reviews (Hall, Wahono, Radjenovi\'c, Kitchenham et al.). Compared to manual methods, FASTREAD lets researchers find 95% relevant studies after reviewing an order of magnitude fewer papers. Compared to other state-of-the-art automatic methods, FASTREAD reviews 20-50% fewer studies while finding same number of relevant primary studies in a systematic literature review.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1612.03224

Country: North America > United States (0.94)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.68)

Industry: Law > Litigation (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

Jiang, Hansi, Wang, Haoyu, Hu, Wenhao, Kakde, Deovrat, Chaudhuri, Arin

arXiv.org Machine LearningFeb-1-2018

Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier $\alpha_i$'s are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only $O(k^2)$, where $k$ is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.

artificial intelligence, machine learning, support vector, (15 more...)

arXiv.org Machine Learning

1709.00139

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Engineering fast multilevel support vector machines

Sadrfaridpour, E., Razzaghi, T., Safro, I.

arXiv.org Machine LearningJan-30-2018

Support vector machine (SVM) is one of the most well-known supervised classification methods that has been extensively used in such fields as disease diagnosis, text categorization, and fraud detection. Training nonlinear SVM classifier (such as Gaussian kernel based) requires solving convex quadratic programming (QP) model whose running time can be prohibitive for large-scale instances without using specialized acceleration techniques such as sampling, boosting, and hierarchical training. Another typical reason of increased running time is complex data sets (e.g., when the data is noisy, imbalanced, or incomplete) that require using model selection techniques for finding the best model parameters. The motivation behind this work was extensive applied experience with hard, large-scale, industrial (not necessarily highly heterogeneous) data sets for which fast linear SVMs produced extremely low quality results (as well as many other fast methods), and various nonlinear SVMs exhibited a strong trade off between running time and quality. It has been noticed in multiple works that many different real-world data sets have a strong underlying multiscale (in some works called hierarchical) structure [35, 31, 37, 66] that can be discovered through careful definitions of coarse-grained resolutions.

artificial intelligence, machine learning, support vector machine, (16 more...)

arXiv.org Machine Learning

1707.07657

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety (0.48)
Information Technology > Security & Privacy (0.46)
Health & Medicine (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Machine learning for graph-based representations of three-dimensional discrete fracture networks

Valera, Manuel, Guo, Zhengyang, Kelly, Priscilla, Matz, Sean, Cantu, Vito Adrian, Percus, Allon G., Hyman, Jeffrey D., Srinivasan, Gowri, Viswanathan, Hari S.

arXiv.org Machine LearningJan-29-2018

Structural and topological information play a key role in modeling flow and transport through fractured rock in the subsurface. Discrete fracture network (DFN) computational suites such as dfnWorks are designed to simulate flow and transport in such porous media. Flow and transport calculations reveal that a small backbone of fractures exists, where most flow and transport occurs. Restricting the flowing fracture network to this backbone provides a significant reduction in the network's effective size. However, the particle tracking simulations needed to determine the reduction are computationally intensive. Such methods may be impractical for large systems or for robust uncertainty quantification of fracture networks, where thousands of forward simulations are needed to bound system behavior. In this paper, we develop an alternative network reduction approach to characterizing transport in DFNs, by combining graph theoretical and machine learning methods. We consider a graph representation where nodes signify fractures and edges denote their intersections. Using random forest and support vector machines, we rapidly identify a subnetwork that captures the flow patterns of the full DFN, based primarily on node centrality features in the graph. Our supervised learning techniques train on particle-tracking backbone paths found by dfnWorks, but run in negligible time compared to those simulations. We find that our predictions can reduce the network to approximately 20% of its original size, while still generating breakthrough curves consistent with those of the original network.

artificial intelligence, classifier, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1705.09866

Country:

North America > United States > California (0.46)
South America (0.14)
North America > Mexico (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)

Add feedback

Solving for multi-class using orthogonal coding matrices

Mills, Peter

arXiv.org Machine LearningJan-27-2018

Probability estimates are desirable in statistical classification both for gauging the accuracy of a classification result and for calibration. Here we describe a method of solving for the conditional probabilities in multi-class classification using orthogonal error correcting codes. The method is tested on six different datasets using support vector machines and compares favorably with an existing technique based on the one-versus-one multi-class method. Probabilities are validated based on the cumulative sum of a boolean evaluation of the correctness of the class label divided by the estimated probability. Probability estimation using orthogonal coding is simple and efficient and has the potential for faster classification results than the one-versus-one method.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

1801.09055

Country: North America > United States > New York (0.14)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.58)

Add feedback

Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art

Gao, Yuan, Srivastava, Brij Mohan Lal, Salsman, James

arXiv.org Machine LearningJan-26-2018

ABSTRACT We use automatic speech recognition to assess spoken English learner pronunciation based on the authentic intelligibility of the learners' spoken responses determined from support vector machine (SVM) classifier or deep learning neural network model predictions of transcription correctness. Using numeric features produced by PocketSphinx alignment mode and many recognition passes searching for the substitution and deletion of each expected phoneme and insertion of unexpected phonemes in sequence, the SVM models achieve 82% agreement with the accuracy of Amazon Mechanical Turk crowdworker transcriptions, up from 75% reported by multiple independent researchers. Using such features with SVM classifier probability prediction models can help computeraided pronunciation teaching (CAPT) systems provide intelligibility remediation. Index Terms-- phoneme alignment, pronunciation assessment, computer aided language learning, binary features 1. INTRODUCTION Authentic intelligibility, the ability of listeners to correctly transcribe recorded utterances, initially used for CAPT by [1] and [2], is a better measure of pronunciation assessment for spoken language learners compared to mispronunciations identified by expert pronunciation judges or panels of experts, because such mispronunciations are associated with only 16% of intelligibility problems, according to [3], who state: We investigated... which words are likely to be misrecognized and which words are likely to be marked as pronunciation errors. Words perceived as mispronounced remain intelligible in about half of all cases.

artificial intelligence, machine learning, phoneme, (16 more...)

arXiv.org Machine Learning

1709.01713

Country: Asia > India (0.29)

Genre: Research Report (0.40)

Industry:

Education (0.35)
Media > News (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection

Bloodgood, Michael

arXiv.org Machine LearningJan-24-2018

The use of active learning has received a lot of interest for reducing annotation costs for text and speech processing applications [1], [2], [3], [4], [5], [6]. Many applications have the following three characteristics: 1) they have imbalanced data sets, 2) training data annotation is a burden, and 3) support vector machines (SVMs) are able to train highperforming systems for the application. Two examples of such applications are Text Classification (TC) and Relation Extraction (RE). Characteristics 2 and 3 suggest the use of AL-SVM (Active Learning (AL) with Support Vector Machines). Previous work has presented an AL-SVM algorithm that selects (i.e., requests labels for) the examples that are closest to the current model's hyperplane [7], [8], [9], [10]. This "closest"-based algorithm has been shown to need modification for imbalanced data situations [11]. Previous work has presented a method for adapting to imbalanced data situations in the context of AL-SVM by using asymmetric cost factors during model training [11]. The asymmetric cost model has been shown to be most effective when the model is based on prevalence statistics from an unbiased initial sample of data and serves as positive amplification for the minority positive examples.

active learning, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1801.07875

Country:

Europe (1.00)
North America > United States > California (0.68)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Support Vector Machines for Binary Classification - MATLAB & Simulink

#artificialintelligenceJan-23-2018, 18:27:57 GMT

You can use a support vector machine (SVM) when your data has exactly two classes. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points. The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab.

artificial intelligence, hyperplane, machine learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

The Value of Semi-Supervised Machine Learning

#artificialintelligenceJan-17-2018, 22:58:51 GMT

Your boss hands you a pile of a 100,000 unlabeled images and asks you to categorize whether they are sandals, pants, boots, etc. So now you have a massive set of unlabeled data and you need labels. Lots of companies are swimming with data, whether its transactional, IoT sensors, security logs, images, voice, or more, and its all unlabeled. With so little labeled data, it is a tedious and slow process for data scientists to build machine learning models in most all enterprises. Take Google's street view data. Gebru had to figure out how to label cars in 50 million images with very little labeled data.

artificial intelligence, autoencoder, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Add feedback

Generalizing, Decoding, and Optimizing Support Vector Machine Classification

Krell, Mario Michael

arXiv.org Machine LearningJan-15-2018

The classification of complex data usually requires the composition of processing steps. Here, a major challenge is the selection of optimal algorithms for preprocessing and classification (including parameterizations). Nowadays, parts of the optimization process are automized but expert knowledge and manual work are still required. We present three steps to face this process and ease the optimization. Namely, we take a theoretical view on classical classifiers, provide an approach to interpret the classifier together with the preprocessing, and integrate both into one framework which enables a semiautomatic optimization of the processing chain and which interfaces numerous algorithms.

machine learning, processing and classification environment pyspace, programming language, (18 more...)

arXiv.org Machine Learning

1801.04929

Country:

Europe > Germany (1.00)
Europe > United Kingdom > England (0.45)
North America > United States > California (0.27)

Genre:

Instructional Material > Course Syllabus & Notes (0.92)
Workflow (0.92)
Summary/Review (0.92)
Research Report > Promising Solution (0.92)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)
Health & Medicine > Health Care Technology (0.92)
(3 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback