Goto

Collaborating Authors

 Support Vector Machines


Using Kernel Methods and Model Selection for Prediction of Preterm Birth

arXiv.org Machine Learning

We describe an application of machine learning to the problem of predicting preterm birth. We conduct a secondary analysis on a clinical trial dataset collected by the National In- stitute of Child Health and Human Development (NICHD) while focusing our attention on predicting different classes of preterm birth. We compare three approaches for deriving predictive models: a support vector machine (SVM) approach with linear and non-linear kernels, logistic regression with different model selection along with a model based on decision rules prescribed by physician experts for prediction of preterm birth. Our approach highlights the pre-processing methods applied to handle the inherent dynamics, noise and gaps in the data and describe techniques used to handle skewed class distributions. Empirical experiments demonstrate significant improvement in predicting preterm birth compared to past work.


The Kernel Trick

#artificialintelligence

The goal of this writeup is to provide a high-level introduction to the "Kernel Trick" commonly used in classification algorithms such as Support Vector Machines (SVM) and Logistic Regression. My target audience are those who have had some basic experience with machine learning, yet are looking for an alternative introduction to kernel methods. We first examine an example that motivates the need for kernel methods. After an explanation about the "Kernel Trick", we finally apply kernels to improve classification results. The following code examples are in Python, and make heavy use of the sklearn, numpy, and scipy libraries.


A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

arXiv.org Machine Learning

Deep neural networks have been shown to suffer from a surprising weakness: their classification outputs can be changed by small, non-random perturbations of their inputs. This adversarial example phenomenon has been explained as originating from deep networks being "too linear" (Goodfellow et al., 2014). We show here that the linear explanation of adversarial examples presents a number of limitations: the formal argument is not convincing, linear classifiers do not always suffer from the phenomenon, and when they do their adversarial examples are different from the ones affecting deep networks. We propose a new perspective on the phenomenon. We argue that adversarial examples exist when the classification boundary lies close to the submanifold of sampled data, and present a mathematical analysis of this new perspective in the linear case. We define the notion of adversarial strength and show that it can be reduced to the deviation angle between the classifier considered and the nearest centroid classifier. Then, we show that the adversarial strength can be made arbitrarily high independently of the classification performance due to a mechanism that we call boundary tilting. This result leads us to defining a new taxonomy of adversarial examples. Finally, we show that the adversarial strength observed in practice is directly dependent on the level of regularisation used and the strongest adversarial examples, symptomatic of overfitting, can be avoided by using a proper level of regularisation.


Understanding the Energy and Precision Requirements for Online Learning

arXiv.org Machine Learning

It is well-known that the precision of data, hyperparameters, and internal representations employed in learning systems directly impacts its energy, throughput, and latency. The precision requirements for the training algorithm are also important for systems that learn on-the-fly. Prior work has shown that the data and hyperparameters can be quantized heavily without incurring much penalty in classification accuracy when compared to floating point implementations. These works suffer from two key limitations. First, they assume uniform precision for the classifier and for the training algorithm and thus miss out on the opportunity to further reduce precision. Second, prior works are empirical studies. In this article, we overcome both these limitations by deriving analytical lower bounds on the precision requirements of the commonly employed stochastic gradient descent (SGD) on-line learning algorithm in the specific context of a support vector machine (SVM). Lower bounds on the data precision are derived in terms of the the desired classification accuracy and precision of the hyperparameters used in the classifier. Additionally, lower bounds on the hyperparameter precision in the SGD training algorithm are obtained. These bounds are validated using both synthetic and the UCI breast cancer dataset. Additionally, the impact of these precisions on the energy consumption of a fixed-point SVM with on-line training is studied.


Looking backwards, looking forwards: SAS, data mining, and machine learning

@machinelearnbot

Looking forward, ten of my SAS colleagues and I are heading to New York City this weekend for KDD 2014: Data Science for the Social Good, which runs August 24-27. This event's full name is the 20th Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining, but it is more commonly known as ACM SIGKDD, or just KDD for short. Looking backwards, the first KDD workshop was held in 1989, and these workshops eventually grew into the series of conferences. Whether you still call it data mining, or prefer machine learning or data science, the fact that this year the conference is sold out, with the 2,200 registered exceeding all expectations, is a sign of the trending of this topic. KDD's tagline today is "bringing together the data mining, data science, and analytics community," so this nexus is right where SAS has played for years.


Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA

#artificialintelligence

Most machine learning algorithms have been developed and statistically validated for linearly separable data. Popular examples are linear classifiers like Support Vector Machines (SVMs) or the (standard) Principal Component Analysis (PCA) for dimensionality reduction. However, most real world data requires nonlinear methods in order to perform tasks that involve the analysis and discovery of patterns successfully. The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to perform nonlinear dimensionality reduction via BF kernel principal component analysis (kPCA). The main purpose of principal component analysis (PCA) is the analysis of data to identify patterns that represent the data "well." The principal components can be understood as new axes of the dataset that maximize the variance along those axes (the eigenvectors of the covariance matrix).


Machine learning for beginners: Popular techniques & algorithms - Big Data Analytics Guide

#artificialintelligence

Machine Learning is all about using computer systems and applying statistical techniques and algorithms to identify patterns in data, learn from it and provide data-driven trends, predictions and decisions. Machine learning algorithms have two flavors: Supervised learning and Unsupervised learning.


Application of multiview techniques to NHANES dataset

arXiv.org Machine Learning

Research into disease-related health variables typically involve choosing health variables and conditions, and using statistical methods to study the strength of association of the variables with the condition [9]. These are then used to confirm known or suspected relationships between the behavioural/health factors or disease conditions. There may be information about health status that may be gleaned by considering different aspects of an individual's data, and investigating possible relationships between the variables. Representations that capture these relationships can be useful in predicting presence or risk level of medical conditions. The National Health and Nutrition Examination Survey (NHANES) dataset provides data on health measurements, taken from survey participants, comprising different categories including demographics, laboratory tests and physical measurements.


Top July stories: Bayesian Machine Learning, Explained; Why Big Data is in Trouble: They Forgot About Applied Statistics

#artificialintelligence

Most viewed July stories Bayesian Machine Learning, Explained Why Big Data is in Trouble: They Forgot About Applied Statistics How to Start Learning Deep Learning Top Machine Learning MOOCs and Online Lectures: A Comprehensive Survey What Has Pokemon Got To Do With Big Data? 5 Big Data Projects You Can No Longer Overlook SAS vs R vs Python: Which Tool Do Analytics Pros Prefer? Data Mining History: The Invention of Support Vector Machines Text Mining 101: Topic Modeling 5 Deep Learning Projects You Can No Longer Overlook Most shared Why Big Data is in Trouble: They Forgot About Applied Statistics Bayesian Machine Learning, Explained What Has Pokemon Got To Do With Big Data? Data Mining/Data Science "Nobel Prize": 2016 SIGKDD Innovation Award to Philip S. Yu SAS vs R vs Python: Which Tool Do Analytics Pros Prefer? How to Start Learning Deep Learning Data Mining History: The Invention of Support Vector Machines 5 Big Data Projects You Can No Longer Overlook What is Softmax Regression and How is it Related to Logistic Regression? 7 Steps to Understanding NoSQL Databases


Viewpoint and Topic Modeling of Current Events

arXiv.org Machine Learning

There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.