Goto

Collaborating Authors

 Support Vector Machines


A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

arXiv.org Machine Learning

Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.


A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines

arXiv.org Machine Learning

Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems. However, the numerical difficulties of the SVMs will become severe with the increase of the sample size. Although there exist many solvers for the SVMs, only few of them are designed by exploiting the special structures of the SVMs. In this paper, we propose a highly efficient sparse semismooth Newton based augmented Lagrangian method for solving a large-scale convex quadratic programming problem with a linear equality constraint and a simple box constraint, which is generated from the dual problems of the SVMs. By leveraging the primal-dual error bound result, the fast local convergence rate of the augmented Lagrangian method can be guaranteed. Furthermore, by exploiting the second-order sparsity of the problem when using the semismooth Newton method, the algorithm can efficiently solve the aforementioned difficult problems. Finally, numerical comparisons demonstrate that the proposed algorithm outperforms the current state-of-the-art solvers for the large-scale SVMs.


Using Machine Learning in Venture Capital

#artificialintelligence

I have already (partially) reviewed previous studies where data have been proved to help identify signals that are relevant to assess the success potential of a startup. Even though the list is quite comprehensive, every study usually tends to look at one single factor and a couple of different success scenarios (namely, acquisition and IPO). In our work, we tried to have a more holistic view and use over 120,000 companies to spot signals not only for acquisitions and IPOs but also to compute the probability of raising a subsequent round of funding or shutting the startup down. In the same fashion as backtesting, we created a time-aware approach and analyzed companies that were no older than four years old by 2015 and tried to predict their success in the following three years. We also used more than a hundred variables as possible explanatory indicators of success, as well as five different models: Support Vector Machines (SVM); Decision Trees (DT); Random Forests (RF); Extremely Randomized Trees (ERT); and Gradient Tree Boosting (GTB).


Geometric Online Adaptation: Graph-Based OSFS for Streaming Samples

arXiv.org Machine Learning

Feature selection seeks a curated subset of available features such that they contain sufficient discriminative information for a given learning task. Online streaming feature selection (OSFS) further extends this to the streaming scenario where the model gets only a single pass at features, one at a time. While this problem setting allows for training high performance models with low computational and storage requirements, this setting also makes the assumption that there is a fixed number of samples, which is often invalidated in many real-world problems. In this paper, we consider a new setting called Online Streaming Feature Selection with Streaming Samples (OSFS-SS) with a fixed class label space, where both the features and the samples are simultaneously streamed. We extend the state-of-the-art OSFS method to work in this setting. Furthermore, we introduce a novel algorithm, that has applications in both the OSFS and OSFS-SS settings, called Geometric Online Adaptation (GOA) which uses a graph-based class conditional geometric dependency (CGD) criterion to measure feature relevance and maintain a minimal feature subset with relatively high classification performance. We evaluate the proposed GOA algorithm on both simulation and real world datasets highlighting how in both the OSFS and OSFS-SS settings it achieves higher performance while maintaining smaller feature subsets than relevant baselines.


Tutorial on Implied Posterior Probability for SVMs

arXiv.org Machine Learning

Department of Data Science, Medical Data Science Ltd., Bulgaria Editor: Abstract Implied posterior probability of a given model (say, Support Vector Machines (SVM)) at a point x is an estimate of the class posterior probability pertaining to the class of functions of the model applied to a given dataset. It can be regarded as a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. In this tutorial we discuss how to compute implied posterior probabilities of SVMs for the binary classification case as well as how to calibrate them via a standard method of isotonic regression. Keywords: Posterior probability, Bayes rule, Classification, SVMs 1. Introduction The implied posterior probability method for estimating class posterior probability has recently been proposed (Nalbantov and Ivanov, 2019). The method provides a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. The main difference with other methods for solving this problem is the non-reliance on the original model built on the data to estimate posterior probabilities for points which do not belong to the separation surface of the model. Rather, the estimates are based on the class of functions used to build the (original) model, as applied to different versions of the dataset, where the relative weight of the instances varies between the classes. For each such relative weight a different model is built, which is relevant for the estimation of a particular value of the posterior probability.


Stock Market Forecasting Based on Text Mining Technology: A Support Vector Machine Method

arXiv.org Machine Learning

News items have a significant impact on stock markets but the ways are obscure. Many previous works have aimed at finding accurate stock market forecasting models. In this paper, we use text mining and sentiment analysis on Chinese online financial news, to predict Chinese stock tendency and stock prices based on support vector machine (SVM). Firstly, we collect 2,302,692 news items, which date from 1/1/2008 to 1/1/2015. Secondly, based on this dataset, a specific domain stop-word dictionary and a precise sentiment dictionary are formed. Thirdly, we propose a forecasting model using SVM. On the algorithm of SVM implementation, we also propose two-parameter optimization algorithms to search for the best initial parameter setting. The result shows that parameter G has the main effect, while parameter C's effect is not obvious. Furthermore, support vector regression (SVR) models for different Chinese stocks are similar whereas in support vector classification (SVC) models best parameters are quite differential. Series of contrast experiments show that: a) News has significant influence on stock market; b) Expansion input vector for additional situations when that day has no news data is better than normal input in SVR, yet is worse in SVC; c) SVR shows a fantastic degree of fitting in predicting stock fluctuation while such result has some time lag; d) News effect time lag for stock market is less than two days; e) In SVC, historic stock data has a most efficient time lag which is about 10 days, whereas in SVR this effect is not obvious. Besides, based on the special structure of the input vector, we also design a method to calculate the financial source impact factor. Result suggests that the news quality and audience number both have a significant effect on the source impact factor. Besides, for Chinese investors, traditional media has more influence than digital media.


A Survey of Machine Learning Applied to Computer Architecture Design

arXiv.org Artificial Intelligence

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.


Hyperspectral Image Classification With Context-Aware Dynamic Graph Convolutional Network

arXiv.org Machine Learning

In hyperspectral image (HSI) classification, spatial context has demonstrated its significance in achieving promising performance. However, conventional spatial context-based methods simply assume that spatially neighboring pixels should correspond to the same land-cover class, so they often fail to correctly discover the contextual relations among pixels in complex situations, and thus leading to imperfect classification results on some irregular or inhomogeneous regions such as class boundaries. To address this deficiency, we develop a new HSI classification method based on the recently proposed Graph Convolutional Network (GCN), as it can flexibly encode the relations among arbitrarily structured non-Euclidean data. Different from traditional GCN, there are two novel strategies adopted by our method to further exploit the contextual relations for accurate HSI classification. First, since the receptive field of traditional GCN is often limited to fairly small neighborhood, we proposed to capture long range contextual relations in HSI by performing successive graph convolutions on a learned region-induced graph which is transformed from the original 2D image grids. Second, we refine the graph edge weight and the connective relationships among image regions by learning the improved adjacency matrix and the 'edge filter', so that the graph can be gradually refined to adapt to the representations generated by each graph convolutional layer. Such updated graph will in turn result in accurate region representations, and vice versa. The experiments carried out on three real-world benchmark datasets demonstrate that the proposed method yields significant improvement in the classification performance when compared with some state-of-the-art approaches.


A Radiomics Approach to Computer-Aided Diagnosis with Cardiac Cine-MRI

arXiv.org Machine Learning

Use expert visualization or conventional clinical indices can lack accuracy for borderline classications. Advanced statistical approaches based on eigen-decomposition have been mostly concerned with shape and motion indices. In this paper, we present a new approach to identify CVDs from cine-MRI by estimating large pools of radiomic features (statistical, shape and textural features) encoding relevant changes in anatomical and image characteristics due to CVDs. The calculated cine-MRI radiomic features are assessed using sequential forward feature selection to identify the most relevant ones for given CVD classes (e.g. myocardial infarction, cardiomyopathy, abnormal right ventricle). Finally, advanced machine learning is applied to suitably integrate the selected radiomics for final multi-feature classification based on Support Vector Machines (SVMs). The proposed technique was trained and cross-validated using 100 cine-MRI cases corresponding to five different cardiac classes from the ACDC MICCAI 2017 challenge \footnote{https://www.creatis.insa-lyon.fr/Challenge/acdc/index.html}. All cases were correctly classified in this preliminary study, indicating potential of using large-scale radiomics for MRI-based diagnosis of CVDs.


Conditions for Unnecessary Logical Constraints in Kernel Machines

arXiv.org Artificial Intelligence

A main property of support vector machines consists in the fact that only a small portion of the training data is significant to determine the maximum margin separating hyperplane in the feature space, the so called support vectors . In a similar way, in the general scheme of learning from constraints, where possibly several constraints are considered, some of them may turn out to be unnecessary with respect to the learning optimization, even if they are active for a given optimal solution. In this paper we extend the definition of support vector to support constraint and we provide some criteria to determine which constraints can be removed from the learning problem still yielding the same optimal solutions. In particular, we discuss the case of logical constraints expressed by null Lukasiewicz logic, where both inferential and algebraic arguments can be considered. Some theoretical results that characterize the concept of unnecessary constraint are proved and explained by means of examples.