Goto

Collaborating Authors

 svm


Support vector machines and linear regression coincide with very high-dimensional features

Neural Information Processing Systems

The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generality of this phenomenon and make the following contributions. First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models, matching the upper bounds from previous works. We further identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality. Finally, we hypothesize that this phase transition occurs only in much higher-dimensional settings in the $\ell_1$ variant of the SVM, and we present a new geometric characterization of the problem that may elucidate this phenomenon for the general $\ell_p$ case.


An efficient, accurate, and interpretable machine learning method for computing probability of failure

Zhu, Jacob, Estep, Donald

arXiv.org Machine Learning

We introduce a novel machine learning method called the Penalized Profile Support Vector Machine based on the Gabriel edited set for the computation of the probability of failure for a complex system as determined by a threshold condition on a computer model of system behavior. The method is designed to minimize the number of evaluations of the computer model while preserving the geometry of the decision boundary that determines the probability. It employs an adaptive sampling strategy designed to strategically allocate points near the boundary determining failure and builds a locally linear surrogate boundary that remains consistent with its geometry by strategic clustering of training points. We prove two convergence results and we compare the performance of the method against a number of state of the art classification methods on four test problems. We also apply the method to determine the probability of survival using the Lotka--Volterra model for competing species.



On the Equivalence between Neural Network and Support Vector Machine

Neural Information Processing Systems

Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g.


Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Neural Information Processing Systems

Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the \textit{maximum} similarity over a group of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show that positive definiteness indeed holds \textit{with high probability} for kernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime. We also show how additional properties such as their ability to incorporate local features at multiple spatial scales, e.g., as done in CNNs through max pooling, and to provide the benefits of composition through the architecture of multiple layers, can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network benchmarks for small sample sizes.


Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges

Wagstaff, Kiri L.

arXiv.org Artificial Intelligence

Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.


Practical Machine Learning for Aphasic Discourse Analysis

Pittman, Jason M., Phillips, Anton Jr., Medina-Santos, Yesenia, Stark, Brielle C.

arXiv.org Artificial Intelligence

Analyzing spoken discourse is a valid means of quantifying language ability in persons with aphasia. There are many ways to quantify discourse, one common way being to evaluate the informativeness of the discourse. That is, given the total number of words produced, how many of those are context-relevant and accurate. This type of analysis is called Correct Information Unit (CIU) analysis and is one of the most prevalent discourse analyses used by speech-language pathologists (SLPs). Despite this, CIU analysis in the clinic remains limited due to the manual labor needed by SLPs to code and analyze collected speech. Recent advances in machine learning (ML) seek to augment such labor by automating modeling of propositional, macrostructural, pragmatic, and multimodal dimensions of discourse. To that end, this study evaluated five ML models for reliable identification of Correct Information Units (CIUs, Nicholas & Brookshire, 1993), during a picture description task. The five supervised ML models were trained using randomly selected human-coded transcripts and accompanying words and CIUs from persons with aphasia. The baseline model training produced a high accuracy across transcripts for word vs non-word, with all models achieving near perfect performance (0.995) with high AUC range (0.914 min, 0.995 max). In contrast, CIU vs non-CIU showed a greater variability, with the k-nearest neighbor (k-NN) model the highest accuracy (0.824) and second highest AUC (0.787). These findings indicate that while the supervised ML models can distinguish word from not word, identifying CIUs is challenging.


Large Margin Discriminant Dimensionality Reduction in Prediction Space

Neural Information Processing Systems

In this paper we establish a duality between boosting and SVM, and use this to derive a novel discriminant dimensionality reduction algorithm. In particular, using the multiclass formulation of boosting and SVM we note that both use a combination of mapping and linear classification to maximize the multiclass margin. In SVM this is implemented using a pre-defined mapping (induced by the kernel) and optimizing the linear classifiers. In boosting the linear classifiers are pre-defined and the mapping (predictor) is learned through combination of weak learners. We argue that the intermediate mapping, e.g.