Support Vector Machines
Forecasting the movements of Bitcoin prices: an application of machine learning algorithms
Pabuccu, Hakan, Ongan, Serdar, Ongan, Ayse
Cryptocurrencies, such as Bitcoin, are one of the most controversial and complex technological innovations in today's financial system. This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning (ML) algorithms are applied, namely, the Support Vector Machines (SVM), the Artificial Neural Network (ANN), the Naive Bayes (NB) and the Random Forest (RF) besides the logistic regression (LR) as a benchmark model. In order to test these algorithms, besides existing continuous dataset, discrete dataset was also created and used. For the evaluations of algorithm performances, the F statistic, accuracy statistic, the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) and the Root Absolute Error (RAE) metrics were used. The t test was used to compare the performances of the SVM, ANN, NB and RF with the performance of the LR. Empirical findings reveal that, while the RF has the highest forecasting performance in the continuous dataset, the NB has the lowest. On the other hand, while the ANN has the highest and the NB the lowest performance in the discrete dataset. Furthermore, the discrete dataset improves the overall forecasting performance in all algorithms (models) estimated.
Compositional optimization of quantum circuits for quantum kernels of support vector machines
Torabian, Elham, Krems, Roman V.
While quantum machine learning (ML) has been proposed to be one of the most promising applications of quantum computing, how to build quantum ML models that outperform classical ML remains a major open question. Here, we demonstrate a Bayesian algorithm for constructing quantum kernels for support vector machines that adapts quantum gate sequences to data. The algorithm increases the complexity of quantum circuits incrementally by appending quantum gates selected with Bayesian information criterion as circuit selection metric and Bayesian optimization of the parameters of the locally optimal quantum circuits identified. The goal is to build quantum kernels for SVM that can solve classification problems with as little training data as possible. The performance of the resulting quantum models for the classification problems considered here significantly exceeds that of optimized classical models with conventional kernels.
Py-Feat: Python Facial Expression Analysis Toolbox
Cheong, Jin Hyun, Jolly, Eshin, Xie, Tiankang, Byrne, Sophie, Kenney, Matthew, Chang, Luke J.
Studying facial expressions is a notoriously difficult endeavor. Recent advances in the field of affective computing have yielded impressive progress in automatically detecting facial expressions from pictures and videos. However, much of this work has yet to be widely disseminated in social science domains such as psychology. Current state of the art models require considerable domain expertise that is not traditionally incorporated into social science training programs. Furthermore, there is a notable absence of user-friendly and open-source software that provides a comprehensive set of tools and functions that support facial expression research. In this paper, we introduce Py-Feat, an open-source Python toolbox that provides support for detecting, preprocessing, analyzing, and visualizing facial expression data. Py-Feat makes it easy for domain experts to disseminate and benchmark computer vision models and also for end users to quickly process, analyze, and visualize face expression data. We hope this platform will facilitate increased use of facial expression data in human behavior research.
pystacked: Stacking generalization and machine learning in Stata
Ahrens, Achim, Hansen, Christian B., Schaffer, Mark E.
pystacked implements stacked generalization (Wolpert, 1992) for regression and binary classification via Python's scikit-learn. Stacking combines multiple supervised machine learners -- the "base" or "level-0" learners -- into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multi-layer perceptron). pystacked can also be used with as a `regular' machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn's machine learning algorithms.
Tensorized LSSVMs for Multitask Regression
Liu, Jiani, Tao, Qinghua, Zhu, Ce, Liu, Yipeng, Suykens, Johan A. K.
Multitask learning (MTL) can utilize the relatedness between multiple tasks for performance improvement. The advent of multimodal data allows tasks to be referenced by multiple indices. High-order tensors are capable of providing efficient representations for such tasks, while preserving structural task-relations. In this paper, a new MTL method is proposed by leveraging low-rank tensor analysis and constructing tensorized Least Squares Support Vector Machines, namely the tLSSVM-MTL, where multilinear modelling and its nonlinear extensions can be flexibly exerted. We employ a high-order tensor for all the weights with each mode relating to an index and factorize it with CP decomposition, assigning a shared factor for all tasks and retaining task-specific latent factors along each index. Then an alternating algorithm is derived for the nonconvex optimization, where each resulting subproblem is solved by a linear system. Experimental results demonstrate promising performances of our tLSSVM-MTL.
Agent-based Collaborative Random Search for Hyper-parameter Tuning and Global Function Optimization
Esmaeili, Ahmad, Ghorrati, Zahra, Matson, Eric T.
Almost all Machine Learning (ML) algorithms comprise a set of hyper-parameters that control their learning process and the quality of their resulting models. The number of hidden units, the learning rate, the minibatch sizes, etc. in neural networks; the kernel parameters and regularization penalty amount in support vector machines; and maximum depth, samples split criteria, and the number of used features in decision trees are few common hyper-parameter examples that need to be configured for the corresponding learning algorithms. Assuming a specific ML algorithm and a dataset, one can build countless number of models, each with potentially different performance and/or learning speeds, by assigning different values to the algorithm's hyper-parameters. While they provide ultimate flexibility in using ML algorithms in different scenarios, they also account for most failures and tedious development procedures. Unsurprisingly, there are numerous studies and practices in the machine learning community devoted to the optimization of hyperparameter. The most straightforward yet difficult approach utilizes expert knowledge to identify potentially better candidates in hyper-parameter search spaces to evaluate and use. The availability of expert knowledge and generating reproducible results are among the primary limitation of such manual searching technique [1], particularly due to the fact that using any learning algorithm on different datasets likely requires different sets of hyper-parameter values [2].
EZtune: A Package for Automated Hyperparameter Tuning in R
Statistical learning models have been growing in popularity in recent years. Many of these models have hyperparameters that must be tuned for models to perform well. Tuning these parameters is not trivial. EZtune is an R package with a simple user interface that can tune support vector machines, adaboost, gradient boosting machines, and elastic net. We first provide a brief summary of the the models that EZtune can tune, including a discussion of each of their hyperparameters. We then compare the ease of using EZtune, caret, and tidymodels. This is followed with a comparison of the accuracy and computation times for models tuned with EZtune and tidymodels. We conclude with a demonstration of how how EZtune can be used to help select a final model with optimal predictive power. Our comparison shows that EZtune can tune support vector machines and gradient boosting machines with EZtune also provides a user interface that is easy to use for a novice to statistical learning models or R.
Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers
Coppock, Harry, Nicholson, George, Kiskin, Ivan, Koutra, Vasiliki, Baker, Kieran, Budd, Jobie, Payne, Richard, Karoune, Emma, Hurley, David, Titcomb, Alexander, Egglestone, Sabrina, Caรฑadas, Ana Tendero, Butler, Lorraine, Jersakova, Radka, Mellor, Jonathon, Patel, Selina, Thornley, Tracey, Diggle, Peter, Richardson, Sylvia, Packham, Josef, Schuller, Bjรถrn W., Pigoli, Davide, Gilmour, Steven, Roberts, Stephen, Holmes, Chris
Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. Here, we undertake a large-scale study of audio-based AI classifiers, as part of the UK government's pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive PCR tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC-AUC=0.846 However, after matching on measured confounders, such as selfreported symptoms, performance is much weaker (ROC-AUC=0.619 Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions based on user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides novel insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics. The coronavirus disease 2019 (COVID-19) pandemic has been estimated by the World Health Organization (WHO) to have caused 14.9 million excess deaths over the 2020-2021 period (link). Table S1 summarises nine highly cited datasets and corresponding classification performance. Here, we analyse the largest PCR-validated dataset collected to date in the field of audio-based COVID-19 screening (ABCS). We design and specify an analysis plan in advance, to investigate whether using audio-based classifiers can improve the accuracy of COVID-19 screening over using self-reported symptoms. Our contribution is as follows: - We collect a respiratory acoustic dataset of 67,842 individuals with linked PCR test outcomes, including 23,514 who tested positive for COVID-19.
Authorship attribution for Differences between Literary Texts by Bilingual Russian-French and Non-Bilingual French Authors
Do bilingual Russian-French authors of the end of the twentieth century such as Andre\"i Makine, Val\'ery Afanassiev, Vladimir F\'edorovski, Iegor Gran, Luba Jurgenson have common stylistic traits in the novels they wrote in French? Can we distinguish between them and non-bilingual French writers' texts? Is the phenomenon of interference observable in French texts of Russian authors? This paper applies authorship attribution methods including Support Vector Machine (SVM), $K$-Nearest Neighbors (KNN), Ridge classification, and Neural Network to answer these questions.
Variational Quantum Approximate Support Vector Machine with Inference Transfer
Park, Siheon, Park, Daniel K., Rhee, June-Koo Kevin
A kernel-based quantum classifier is the most practical and influential quantum machine learning technique for the hyper-linear classification of complex data. We propose a Variational Quantum Approximate Support Vector Machine (VQASVM) algorithm that demonstrates empirical sub-quadratic run-time complexity with quantum operations feasible even in NISQ computers. We experimented our algorithm with toy example dataset on cloud-based NISQ machines as a proof of concept. We also numerically investigated its performance on the standard Iris flower and MNIST datasets to confirm the practicality and scalability.