Goto

Collaborating Authors

 Support Vector Machines


The Mean and Median Criterion for Automatic Kernel Bandwidth Selection for Support Vector Data Description

arXiv.org Machine Learning

Abstract--Support vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice for the kernel function. The Gaussian kernel has a bandwidth parameter, whose value is important for good results. A small bandwidth leads to overfitting, and the resulting SVDD classifier overestimates the number of anomalies. A large bandwidth leads to underfitting, and the classifier fails to detect many anomalies. In this paper we present a new automatic, unsupervised method for selecting the Gaussian kernel bandwidth. The selected value can be computed quickly, and it is competitive with existing bandwidth selection methods. Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and anomaly detection.


Vector Space Model as Cognitive Space for Text Classification

arXiv.org Artificial Intelligence

In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes.



DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

arXiv.org Machine Learning

When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.


The Machine Learning Abstracts: Support Vector Machines

@machinelearnbot

Last post, we discussed a type of classification algorithm, Decision Trees. There is another machine learning algorithm which can be used for classification, Support Vector Machines (SVM). Just like any classification algorithm, support vector machines learn to classify or divide any given data point into multiple classes. The key to understanding SVMs is to study how it does that. Each data point when plotted visually, can be represented as a vector from the origin.


Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description

arXiv.org Machine Learning

Abstract--Support V ector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the data boundary changes from spherical to wiggly. The spherical data boundary leads to underfitting, and an extremely wiggly data boundary leads to overfitting. In this paper, we propose an empirical criterion to obtain good values of the Gaussian kernel bandwidth parameter . This criterion provides a smooth boundary that captures the essential geometric features of the data. Support V ector Data Description (SVDD) is a machine learning technique used for single-class classification and outlier detection.


Machine Learning Algorithms: A Concise Technical Overview – Part 1

@machinelearnbot

Whether you are a newcomer to machine learning, a newbie to specific algorithms or concepts, or a seasoned ML vet looking for a once-over of an algorithm you haven't seen or used in a while, these short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept. Support Vector Machines (SVMs) are a particular classification strategy. SMVs work by transforming the training dataset into a higher dimension, which is then inspected for the optimal separation boundary, or boundaries, between classes. In SVMs, these boundaries are referred to as hyperplanes, which are identified by locating support vectors, or the instances that most essentially define classes, and their margins, which are the lines parallel to the hyperplane defined by the shortest distance between a hyperplane and its support vectors.


Python Programming Tutorials

#artificialintelligence

Welcome to a new section in our Machine Learning Tutorial series: Deep Learning with Neural Networks and TensorFlow. The artificial neural network is a biologically-inspired methodology to conduct machine learning, intended to mimic your brain (a biological neural network). The Artificial Neural Network, which I will now just refer to as a neural network, is not a new concept. The idea has been around since the 1940's, and has had a few ups and downs, most notably when compared against the Support Vector Machine (SVM). For example, the Neural Network was popularized up until the mid 90s when it was shown that the SVM, using a new-to-the-public (the technique itself was thought up long before it was actually put to use) technique, the "Kernel Trick," was capable of working with non-linearly separable datasets.


Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification

arXiv.org Machine Learning

Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with and without graph metrics, respectively. SVM with a polynomial kernel had ROC index of 72.9 and 75.6 for models with and without graph metrics, respectively. Although this did not perform as well as Logistic Regression, the results are consistent with previous studies utilizing SVM to classify diabetes.


How to squeeze the most from your training data

#artificialintelligence

In many cases, the acquisition of well-labelled training data is a huge hurdle for developing accurate prediction systems with supervised learning. At Love the Sales, we had the requirement to apply classification to the textual metadata of 2 million products (mostly fashion and homewares) into 1,000 different categories – represented in a hierarchy. In order to achieve this, we have architected a hierarchical tree of chained 2-class linear (Positive vs Negative) Support Vector Machines (LibSVM), each responsible for binary document classification of each hierarchical class. A key learning, is that the way in which these SVM's are structured can actually have a significant impact on how much training data has to be applied, for example, a naive approach would have been as follows: This approach requires that for every additional sub-category, two new SVM's be trained – for example, the addition of a new class for'Swimwear' would require an additional SVM under Men's and Women's – not to mention the potential complexity of adding a'Unisex' class at the top level. Overall, deep hierarchical structures can be too rigid to work with.