Goto

Collaborating Authors

 Support Vector Machines


A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction

Neural Information Processing Systems

We propose a consolidated cross-validation (CV) algorithm for training and tuning the support vector machines (SVM) on reproducing kernel Hilbert spaces. Our consolidated CV algorithm utilizes a recently proposed exact leave-one-out formula for the SVM and accelerates the SVM computation via a data reduction strategy. In addition, to compute the SVM with the bias term (intercept), which is not handled by the existing data reduction methods, we propose a novel two-stage consolidated CV algorithm. With numerical studies, we demonstrate that our algorithm is about an order of magnitude faster than the two mainstream SVM solvers, kernlab and LIBSVM, with almost the same accuracy.


1-norm Support Vector Machines

Neural Information Processing Systems

The standard 2-norm SVM is known for its good performance in two- In this paper, we consider the 1-norm SVM. We also propose an ef cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM.


Margin Maximizing Loss Functions

Neural Information Processing Systems

Margin maximizing properties play an important role in the analysis of classi - cation models, such as boosting and support vector machines. Margin maximiza- tion is theoretically interesting because it facilitates generalization error analysis, and practically interesting because it presents a clear geometric interpretation of the models being built. We formulate and prove a suf cient condition for the solutions of regularized loss functions to converge to margin maximizing separa- tors, as the regularization vanishes. This condition covers the hinge loss of SVM, the exponential loss of AdaBoost and logistic regression loss. We also generalize it to multi-class classi cation problems, and present margin maximizing multi- class versions of logistic regression and support vector machines.


Parallelizing Support Vector Machines on Distributed Computers

Neural Information Processing Systems

Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let n denote the number of training instances, p the reduced matrix dimension after factorization ( p is significantly smaller than n), and m the number of machines. PSVM reduces the memory requirement from \MO ( n 2) to \MO ( np/m), and improves computation time to \MO ( np 2/m). Empirical studies on up to 500 computers shows PSVM to be effective.


A Risk Minimization Principle for a Class of Parzen Estimators

Neural Information Processing Systems

This paper explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window classifier. A direct relation with the Rademacher complexities is established, as such facilitating analysis and providing a notion of certainty of prediction. This analysis is related to Support Vector Machines by means of a margin transformation. The power of the MAM principle is illustrated further by application to ordinal regression tasks, resulting in an O(n) algorithm able to process large datasets in reasonable time.


Lower Bounds on Rate of Convergence of Cutting Plane Methods

Neural Information Processing Systems

In a recent paper Joachims (2006) presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an \epsilon accurate solution in O(1/\epsilon {2}) iterations. By tightening the analysis, Teo et al. (2010) showed that O(1/\epsilon) iterations suffice. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a \emph{multivariate} performance score.


Universal Kernels on Non-Standard Input Spaces

Neural Information Processing Systems

During the last years support vector machines (SVMs) have been successfully applied even in situations where the input space X is not necessarily a subset of R d . Examples include SVMs using probability measures to analyse e.g. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H\subset L_p(P_X) is dense, or if the SVM is based on a universal kernel k . So far, however, there are no RKHSs of practical interest known that satisfy these assumptions on \cH or k if X ot\subset R d . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of R d .


Approximating Concavely Parameterized Optimization Problems

Neural Information Processing Systems

We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the parameter can always be approximated with accuracy \varepsilon 0 by a set of size O(1/\sqrt{\varepsilon}) . A lower bound of size \Omega (1/\sqrt{\varepsilon}) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an approximate path of size O(1/\sqrt{\varepsilon}) . Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion.


Classification Using Global and Local Mahalanobis Distances

arXiv.org Machine Learning

We propose a novel semi-parametric classifier based on Mahalanobis distances of an observation from the competing classes. Our tool is a generalized additive model with the logistic link function that uses these distances as features to estimate the posterior probabilities of the different classes. While popular parametric classifiers like linear and quadratic discriminant analyses are mainly motivated by the normality of the underlying distributions, the proposed classifier is more flexible and free from such parametric assumptions. Since the densities of elliptic distributions are functions of Mahalanobis distances, this classifier works well when the competing classes are (nearly) elliptic. In such cases, it often outperforms popular nonparametric classifiers, especially when the sample size is small compared to the dimension of the data. To cope with non-elliptic and possibly multimodal distributions, we propose a local version of the Mahalanobis distance. Subsequently, we propose another classifier based on a generalized additive model that uses the local Mahalanobis distances as features. This nonparametric classifier usually performs like the Mahalanobis distance based semiparametric classifier when the underlying distributions are elliptic, but outperforms it for several non-elliptic and multimodal distributions. We also investigate the behaviour of these two classifiers in high dimension, low sample size situations. A thorough numerical study involving several simulated and real datasets demonstrate the usefulness of the proposed classifiers in comparison to many state-of-the-art methods.


Intelligent Diagnosis of Alzheimer's Disease Based on Machine Learning

arXiv.org Artificial Intelligence

This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance.