Support Vector Machines
Learning Data-adaptive Nonparametric Kernels
Liu, Fanghui, Huang, Xiaolin, Gong, Chen, Yang, Jie, Li, Li
Traditional kernels or their combinations are often not sufficiently flexible to fit the data in complicated practical tasks. In this paper, we present a Data-Adaptive Nonparametric Kernel (DANK) learning framework by imposing an adaptive matrix on the kernel/Gram matrix in an entry-wise strategy. Since we do not specify the formulation of the adaptive matrix, each entry in it can be directly and flexibly learned from the data. Therefore, the solution space of the learned kernel is largely expanded, which makes DANK flexible to adapt to the data. Specifically, the proposed kernel learning framework can be seamlessly embedded to support vector machines (SVM) and support vector regression (SVR), which has the capability of enlarging the margin between classes and reducing the model generalization error. Theoretically, we demonstrate that the objective function of our devised model is gradient-Lipschitz continuous. Thereby, the training process for kernel and parameter learning in SVM/SVR can be efficiently optimized in a unified framework. Further, to address the scalability issue in DANK, a decomposition-based scalable approach is developed, of which the effectiveness is demonstrated by both empirical studies and theoretical guarantees. Experimentally, our method outperforms other representative kernel learning based algorithms on various classification and regression benchmark datasets.
Speaker Fluency Level Classification Using Machine Learning Techniques
Preciado-Grijalva, Alan, Brena, Ramon F.
Level assessment for foreign language students is necessary for putting them in the right level group, furthermore, interviewing students is a very time-consuming task, so we propose to automate the evaluation of speaker fluency level by implementing machine learning techniques. This work presents an audio processing system capable of classifying the level of fluency of non-native English speakers using five different machine learning models. As a first step, we have built our own dataset, which consists of labeled audio conversations in English between people ranging in different fluency domains/classes (low, intermediate, high). We segment the audio conversations into 5s non-overlapped audio clips to perform feature extraction on them. We start by extracting Mel cepstral coefficients from the audios, selecting 20 coefficients is an appropriate quantity for our data. We thereafter extracted zero-crossing rate, root mean square energy and spectral flux features, proving that this improves model performance. Out of a total of 1424 audio segments, with 70% training data and 30% test data, one of our trained models (support vector machine) achieved a classification accuracy of 94.39%, whereas the other four models passed an 89% classification accuracy threshold.
Ensemble Learning Applied to Classify GPS Trajectories of Birds into Male or Female
We describe our first-place solution to the Animal Behavior Challenge (ABC 2018) on predicting gender of bird from its GPS trajectory. The task consisted in predicting the gender of shearwater based on how they navigate themselves across a big ocean. The trajectories are collected from GPS loggers attached on shearwaters' body, and represented as a variable-length sequence of GPS points (latitude and longitude), and associated meta-information, such as the sun azimuth, the sun elevation, the daytime, the elapsed time on each GPS location after starting the trip, the local time (date is trimmed), and the indicator of the day starting the from the trip. We used ensemble of several variants of Gradient Boosting Classifier along with Gaussian Process Classifier and Support Vector Classifier after extensive feature engineering and we ranked first out of 74 registered teams. The variants of Gradient Boosting Classifier we tried are CatBoost (Developed by Yandex), LightGBM (Developed by Microsoft), XGBoost (Developed by Distributed Machine Learning Community). Our approach could easily be adapted to other applications in which the goal is to predict a classification output from a variable-length sequence.
Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing
Onu, Charles C., Kanbar, Lara J., Shalish, Wissam, Brown, Karen A., Sant'Anna, Guilherme M., Kearney, Robert E., Precup, Doina
Abstract-- Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical ventilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust timeseries modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84% of infants who failed extubation could have been accurately identified prior to extubation.
Multiclass Universum SVM
Dhar, Sauptik, Cherkassky, Vladimir, Shah, Mohak
We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose an analytic span bound for model selection with almost 2-4x faster computation times than standard resampling techniques. We empirically demonstrate the efficacy of the proposed MUSVM formulation on several real world datasets achieving > 20% improvement in test accuracies compared to multi-class SVM.
Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data
Samper-González, Jorge, Burgos, Ninon, Bottani, Simona, Fontanella, Sabrina, Lu, Pascal, Marcoux, Arnaud, Routier, Alexandre, Guillon, Jérémy, Bacci, Michael, Wen, Junhao, Bertrand, Anne, Bertin, Hugo, Habert, Marie-Odile, Durrleman, Stanley, Evgeniou, Theodoros, Colliot, Olivier, Initiative, for the Alzheimer's Disease Neuroimaging, Biomarkers, the Australian Imaging, ageing, Lifestyle flagship study of
A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different approaches is also difficult to compare objectively. In particular, it is often difficult to assess which part of the method provides a real improvement, if any. We propose a framework for reproducible and objective classification experiments in AD using three publicly available datasets (ADNI, AIBL and OASIS). The framework comprises: i) automatic conversion of the three datasets into BIDS format, ii) a modular set of preprocessing pipelines, feature extraction and classification methods, together with an evaluation framework, that provide a baseline for benchmarking the different components. We demonstrate the use of the framework for a large-scale evaluation on 1960 participants using T1 MRI and FDG PET data. In this evaluation, we assess the influence of different modalities, preprocessing, feature types, classifiers, training set sizes and datasets. Performances were in line with the state-of-the-art. FDG PET outperformed T1 MRI for all classification tasks. No difference in performance was found for the use of different atlases, image smoothing, partial volume correction of FDG PET images, or feature type. Linear SVM and L2-logistic regression resulted in similar performance and both outperformed random forests. The classification performance increased along with the number of subjects used for training. Classifiers trained on ADNI generalized well to AIBL and OASIS. All the code of the framework and the experiments is publicly available at: https://gitlab.icm-institute.org/aramislab/AD-ML.
Faster Support Vector Machines
Schlag, Sebastian, Schmitt, Matthias, Schulz, Christian
The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of samples. Recently, multilevel approaches to train SVMs have been developed to allow for time efficient training on huge data sets. While regular SVMs perform the entire training in one - time consuming - optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments show that our new algorithm achieves speed-ups up to two orders of magnitude while having similar or better classification quality over state-of-the-art algorithms.
A bagging and importance sampling approach to Support Vector Machines
Bárcenas, R., Gónzalez--Lima, M. D., Quiroz, A. J.
An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed.
How Complex is your classification problem? A survey on measuring classification complexity
Lorena, Ana C., Garcia, Luís P. F., Lehmann, Jens, Souto, Marcilio C. P., Ho, Tin K.
Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.
Data-driven polynomial chaos expansion for machine learning regression
Torre, E., Marelli, S., Embrechts, P., Sudret, B.
We present a regression technique for data driven problems based on polynomial chaos expansion (PCE). PCE is a popular technique in the field of uncertainty quantification (UQ), where it is typically used to replace a runnable but expensive computational model subject to random inputs with an inexpensive-to-evaluate polynomial function. The metamodel obtained enables a reliable estimation of the statistics of the output, provided that a suitable probabilistic model of the input is available. In classical machine learning (ML) regression settings, however, the system is only known through observations of its inputs and output, and the interest lies in obtaining accurate pointwise predictions of the latter. Here, we show that a PCE metamodel purely trained on data can yield pointwise predictions whose accuracy is comparable to that of other ML regression models, such as neural networks and support vector machines. The comparisons are performed on benchmark datasets available from the literature. The methodology also enables the quantification of the output uncertainties and is robust to noise. Furthermore, it enjoys additional desirable properties, such as good performance for small training sets and simplicity of construction, with only little parameter tuning required. In the presence of statistically dependent inputs, we investigate two ways to build the PCE, and show through simulations that one approach is superior to the other in the stated settings.