Support Vector Machines
Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions
Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.
Stochastic optimization algorithms have many desirable features for large-scale machine learning, and accordingly have been the focus of renewed and intensive study in the last several years (e.g., see the papers [26, 4, 10, 30] and references therein). The empirical efficiency of these methods is backed with strong theoretical guarantees, providing sharp bounds on their convergence rates. These convergence rates are known to depend on the structure of the underlying objective function, with faster rates being possible for objective functions that are smooth and/or (strongly) convex, or optima that have desirable features such as sparsity. More precisely, for an objective function that is strongly convex, stochastic gradient descent enjoys a convergence rate ranging from O(1/T), when features vectors are extremely sparse, to O(d/T) when feature vectors are dense [11, 19, 12]. Such results are of significant interest, because the strong convexity condition is satisfied for many common machine learning problems, including boosting, least squares regression, support vector machines and generalized linear models, among other examples. A complementary type of condition is that of sparsity, either exact or approximate, in the optimal solution.
Automated Inference System for End-To-End Diagnosis of Network Performance Issues in Client-Terminal Devices
Widanapathirana, Chathuranga, ลekercioวงlu, Y. Ahmet, Ivanovich, Milosh V., Fitzpatrick, Paul G., Li, Jonathan C.
Traditional network diagnosis methods of Client-Terminal Device (CTD) problems tend to be laborintensive, time consuming, and contribute to increased customer dissatisfaction. In this paper, we propose an automated solution for rapidly diagnose the root causes of network performance issues in CTD. Based on a new intelligent inference technique, we create the Intelligent Automated Client Diagnostic (IACD) system, which only relies on collection of Transmission Control Protocol (TCP) packet traces. Using soft-margin Support Vector Machine (SVM) classifiers, the system (i) distinguishes link problems from client problems and (ii) identifies characteristics unique to the specific fault to report the root cause. The modular design of the system enables support for new access link and fault types. Experimental evaluation demonstrated the capability of the IACD system to distinguish between faulty and healthy links and to diagnose the client faults with 98% accuracy. The system can perform fault diagnosis independent of the user's specific TCP implementation, enabling diagnosis of diverse range of client devices.
Improved brain pattern recovery through ranking approaches
Pedregosa, Fabian, Gramfort, Alexandre, Varoquaux, Gaรซl, Thirion, Bertrand, Pallier, Christophe, Cauvet, Elodie
The prediction of behavioral information or cognitive states from brain activation images such as those obtained with fMRI can be used to assess the specificity of several brain regions for certain cognitive or perceptual functions. This kind of analysis is implemented by learning a classifier or regression function that fits a given target variable given fMRI activations. The accuracy of this prediction depends on whether it uses the relevant variables i.e. the correct brain regions. Recovering the truly predictive pattern has proven to be challenging from a statistical point of view: the high dimensionality of the data together with the limited number of images makes the problem of brain pattern recovery an ill-posed problem. So far, the approaches proposed to address this issue have relied on linear models, with univariate, i.e. voxel-based, Anova (analysis of variance) for hypothesis testing, or, for predictive modeling, with the choice of a regularizer using a priori domain-specific knowledge, such as the l
Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests
Nikumbh, Sarvesh, Ghosh, Shameek, Jayaraman, Valadi
Microarray cancer gene expression data comprise of very high dimensions. Reducing the dimensions helps in improving the overall analysis and classification performance. We propose two hybrid techniques, Biogeography - based Optimization - Random Forests (BBO - RF) and BBO - SVM (Support Vector Machines) with gene ranking as a heuristic, for microarray gene expression analysis. This heuristic is obtained from information gain filter ranking procedure. The BBO algorithm generates a population of candidate subset of genes, as part of an ecosystem of habitats, and employs the migration and mutation processes across multiple generations of the population to improve the classification accuracy. The fitness of each gene subset is assessed by the classifiers - SVM and Random Forests. The performances of these hybrid techniques are evaluated on three cancer gene expression datasets retrieved from the Kent Ridge Biomedical datasets collection and the libSVM data repository. Our results demonstrate that genes selected by the proposed techniques yield classification accuracies comparable to previously reported algorithms.
Applying Discrete PCA in Data Analysis
Buntine, Wray L., Jakulin, Aleks
Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.
Emerging Applications for Intelligent Diabetes Management
Marling, Cindy (Ohio University) | Wiley, Matthew (University of California, Riverside) | Bunescu, Razvan (Ohio University) | Shubrook, Jay (Ohion University) | Schwartz, Frank (Ohio University)
Diabetes management is a difficult task for patients, who must monitor and control their blood glucose levels in order to avoid serious diabetic complications. It is a difficult task for physicians, who must manually interpret large volumes of blood glucose data to tailor therapy to the needs of each patient. This paper describes three emerging applications that employ AI to ease this task: (1) case-based decision support for diabetes management; (2) machine learning classification of blood glucose plots; and (3) support vector regression for blood glucose prediction. The first application provides decision support by detecting blood glucose control problems and recommending therapeutic adjustments to correct them. The second provides an automated screen for excessive glycemic variability. The third aims to build a hypoglycemia predictor that could alert patients to dangerously low blood glucose levels in time to take preventive action. All are products of the 4 Diabetes Support SystemTM project, which uses AI to promote the health and wellbeing of people with type 1 diabetes. These emerging applications could potentially benefit 20 million patients who are at risk for devastating complications, thereby improving quality of life and reducing health care cost expenditures.
NewsFinder: Automating an AI News Service
Eckroth, Joshua (The Ohio State University) | Dong, Liang (Clemson University) | Smith, Reid G. (Marathon Oil Corporation) | Buchanan, Bruce G. (University of Pittsburgh)
NewsFinder automates the steps involved in finding, selecting, categorizing, and publishing news stories that meet relevance criteria for the Artificial Intelligence community. The software combines a broad search of online news sources with topic-specific trained models and heuristics. Since August 2010, the program has been used to operate the AI in the News service that is part of the AAAI AITopics website.
Bayesian Multicategory Support Vector Machines
Zhang, Zhihua, Jordan, Michael I.
We show that the multi-class support vector machine (MSVM) proposed by Lee et al. (2004) can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. We also show that this interpretation can be extended to a hierarchical Bayesian architecture and to a fully-Bayesian inference procedure for multiclass classification based on data augmentation. We present empirical results that show that the advantages of the Bayesian formalism are obtained without a loss in classification accuracy.
Discriminative Learning via Semidefinite Probabilistic Models
Crammer, Koby, Globerson, Amir
Discriminative linear models are a popular tool in machine learning. These can be generally divided into two types: The first is linear classifiers, such as support vector machines, which are well studied and provide state-of-the-art results. One shortcoming of these models is that their output (known as the 'margin') is not calibrated, and cannot be translated naturally into a distribution over the labels. Thus, it is difficult to incorporate such models as components of larger systems, unlike probabilistic based approaches. The second type of approach constructs class conditional distributions using a nonlinearity (e.g. log-linear models), but is occasionally worse in terms of classification error. We propose a supervised learning method which combines the best of both approaches. Specifically, our method provides a distribution over the labels, which is a linear function of the model parameters. As a consequence, differences between probabilities are linear functions, a property which most probabilistic models (e.g. log-linear) do not have. Our model assumes that classes correspond to linear subspaces (rather than to half spaces). Using a relaxed projection operator, we construct a measure which evaluates the degree to which a given vector 'belongs' to a subspace, resulting in a distribution over labels. Interestingly, this view is closely related to similar concepts in quantum detection theory. The resulting models can be trained either to maximize the margin or to optimize average likelihood measures. The corresponding optimization problems are semidefinite programs which can be solved efficiently. We illustrate the performance of our algorithm on real world datasets, and show that it outperforms 2nd order kernel methods.
Small-sample Brain Mapping: Sparse Recovery on Spatially Correlated Designs with Randomization and Clustering
Varoquaux, Gael, Gramfort, Alexandre, Thirion, Bertrand
Functional neuroimaging can measure the brain?s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors for the stimulus. Brain mapping is then seen as a support recovery problem. On functional MRI (fMRI) data, this problem is particularly challenging as i) the number of samples is small due to limited acquisition time and ii) the variables are strongly correlated. We propose to overcome these difficulties using sparse regression models over new variables obtained by clustering of the original variables. The use of randomization techniques, e.g. bootstrap samples, and clustering of the variables improves the recovery properties of sparse methods. We demonstrate the benefit of our approach on an extensive simulation study as well as two fMRI datasets.