Support Vector Machines
SIFT and SURF based feature extraction for the anomaly detection
Abstract: In this paper, we suggest a way, how to use SIFT and SURF algorithms to extract the image features for anomaly detection. We use those feature vectors to train various classifiers on a real-world dataset in the semi -supervised (with a small number of faulty samples) manner with a large number of classifiers and in the one-class (with no faulty samples) manner using the SVDD and SVM classifier. We prove, that the SIFT and SURF algorithms could be used as feature extractors, that they could be used to train a semi-supervised and one-class classifier with an accuracy around 89% and that the performance of the one-class classifier could be comparable to the semi-supervised one. We also made our dataset and source code publicly available. Keywords: Anomaly detection, Object descriptors, Machine Learning, SIFT, SURF 1 INTRODUCTION The anomaly detection (AD) techniques are state-of the art methods for outliers detection in industrial, or medical diagnosis tasks.
An approach to the Gaussian RBF kernels via Fock spaces
Alpay, Daniel, Colombo, Fabrizio, Diki, Kamal, Sabadini, Irene
We use methods from the Fock space and Segal-Bargmann theories to prove several results on the Gaussian RBF kernel in complex analysis. The latter is one of the most used kernels in modern machine learning kernel methods, and in support vector machines (SVMs) classification algorithms. Complex analysis techniques allow us to consider several notions linked to the RBF kernels like the feature space and the feature map, using the so-called Segal-Bargmann transform. We show also how the RBF kernels can be related to some of the most used operators in quantum mechanics and time frequency analysis, specifically, we prove the connections of such kernels with creation, annihilation, Fourier, translation, modulation and Weyl operators. For the Weyl operators, we also study a semigroup property in this case.
How Computer-Aided Diagnosis works part1
Abstract: Computer-aided methods have shown added value for diagnosing and predicting brain disorders and can thus support decision making in clinical care and treatment planning. This chapter will provide insight into the type of methods, their working, their input data -- such as cognitive tests, imaging and genetic data -- and the types of output they provide. We will focus on specific use cases for diagnosis, i.e. estimating the current'condition' of the patient, such as early detection and diagnosis of dementia, differential diagnosis of brain tumours, and decision making in stroke. Regarding prediction, i.e. estimation of the future'condition' of the patient, we will zoom in on use cases such as predicting the disease course in multiple sclerosis and predicting patient outcomes after treatment in brain cancer. Furthermore, based on these use cases, we will assess the current state-of-the-art methodology and highlight current efforts on benchmarking of these methods and the importance of open science therein.
How Quantum Support Vector Machines are used part1
Abstract: Aerodynamics plays an important role in aviation industry and aircraft design. Detecting and minimizing the phenomenon of flow separation from scattered pressure data on airfoil is critical for ensuring stable and efficient aviation. However, since it is challenging to understand the mechanics of flow field separation, the aerodynamic parameters are emphasized for the identification and control of flow separation. It has been investigated extensively using traditional algorithms and machine learning methods such as the support vector machine (SVM) models. Recently, a growing interest in quantum computing and its applications among wide research communities sheds light upon the use of quantum techniques to solve aerodynamic problems. In this paper, we apply qSVM, a quantum SVM algorithm based on the quantum annealing model, to identify whether there is flow separation, with their performance in comparison to the widely-used classical SVM.
How Quantum Support Vector Machines are used part2
Abstract: Quantum computers have the potential to speed up certain computational tasks. A possibility this opens up within the field of machine learning is the use of quantum techniques that may be inefficient to simulate classically but could provide superior performance in some tasks. Machine learning algorithms are ubiquitous in particle physics and as advances are made in quantum machine learning technology there may be a similar adoption of these quantum techniques. In this work a quantum support vector machine (QSVM) is implemented for signal-background classification. We investigate the effect of different quantum encoding circuits, the process that transforms classical data into a quantum state, on the final classification performance.
Imbalanced Class Data Performance Evaluation and Improvement using Novel Generative Adversarial Network-based Approach: SSG and GBO
Ahsan, Md Manjurul, Ali, Md Shahin, Siddique, Zahed
Class imbalance in a dataset is one of the major challenges that can significantly impact the performance of machine learning models resulting in biased predictions. Numerous techniques have been proposed to address class imbalanced problems, including, but not limited to, Oversampling, Undersampling, and cost-sensitive approaches. Due to its ability to generate synthetic data, oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) is among the most widely used methodology by researchers. However, one of SMOTE's potential disadvantages is that newly created minor samples may overlap with major samples. As an effect, the probability of ML models' biased performance towards major classes increases. Recently, generative adversarial network (GAN) has garnered much attention due to its ability to create almost real samples. However, GAN is hard to train even though it has much potential. This study proposes two novel techniques: GAN-based Oversampling (GBO) and Support Vector Machine-SMOTE-GAN (SSG) to overcome the limitations of the existing oversampling approaches. The preliminary computational result shows that SSG and GBO performed better on the expanded imbalanced eight benchmark datasets than the original SMOTE. The study also revealed that the minor sample generated by SSG demonstrates Gaussian distributions, which is often difficult to achieve using original SMOTE.
Estimating oil and gas recovery factors via machine learning: Database-dependent accuracy and reliability
Roustazadeh, Alireza, Ghanbarian, Behzad, Shadmand, Mohammad B., Taslimitehrani, Vahid, Lake, Larry W.
With recent advances in artificial intelligence, machine learning (ML) approaches have become an attractive tool in petroleum engineering, particularly for reservoir characterizations. A key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would provide decisive insights to drilling and production strategies. Therefore, this study aims to estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as porosity, permeability, pressure, and water saturation via the ML. We applied three regression-based models including the extreme gradient boosting (XGBoost), support vector machine (SVM), and stepwise multiple linear regression (MLR) and various combinations of three databases to construct ML models and estimate the oil and/or gas RF. Using two databases and the cross-validation method, we evaluated the performance of the ML models. In each iteration 90 and 10% of the data were respectively used to train and test the models. The third independent database was then used to further assess the constructed models. For both oil and gas RFs, we found that the XGBoost model estimated the RF for the train and test datasets more accurately than the SVM and MLR models. However, the performance of all the models were unsatisfactory for the independent databases. Results demonstrated that the ML algorithms were highly dependent and sensitive to the databases based on which they were trained. Statistical tests revealed that such unsatisfactory performances were because the distributions of input features and target variables in the train datasets were significantly different from those in the independent databases (p-value < 0.05).
Abstract Interpretation-Based Feature Importance for SVMs
Pal, Abhinandan, Ranzato, Francesco, Urban, Caterina, Zanella, Marco
We propose a symbolic representation for support vector machines (SVMs) by means of abstract interpretation, a well-known and successful technique for designing and implementing static program analyses. We leverage this abstraction in two ways: (1) to enhance the interpretability of SVMs by deriving a novel feature importance measure, called abstract feature importance (AFI), that does not depend in any way on a given dataset of the accuracy of the SVM and is very fast to compute, and (2) for verifying stability, notably individual fairness, of SVMs and producing concrete counterexamples when the verification fails. We implemented our approach and we empirically demonstrated its effectiveness on SVMs based on linear and non-linear (polynomial and radial basis function) kernels. Our experimental results show that, independently of the accuracy of the SVM, our AFI measure correlates much more strongly with the stability of the SVM to feature perturbations than feature importance measures widely available in machine learning software such as permutation feature importance. It thus gives better insight into the trustworthiness of SVMs.
Machine Learning Sensors for Diagnosis of COVID-19 Disease Using Routine Blood Values for Internet of Things Application
Velichko, Andrei, Huyut, Mehmet Tahir, Belyaev, Maksim, Izotov, Yuriy, Korzun, Dmitry
Healthcare digitalization requires effective applications of human sensors, when various parameters of the human body are instantly monitored in everyday life due to the Internet of Things (IoT). In particular, machine learning (ML) sensors for the prompt diagnosis of COVID-19 are an important option for IoT application in healthcare and ambient assisted living (AAL). Determining a COVID-19 infected status with various diagnostic tests and imaging results is costly and time-consuming. This study provides a fast, reliable and cost-effective alternative tool for the diagnosis of COVID-19 based on the routine blood values (RBVs) measured at admission. The dataset of the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 test results and 51 routine blood values. In this study, 13 popular classifier machine learning models and the LogNNet neural network model were exanimated. The most successful classifier model in terms of time and accuracy in the detection of the disease was the histogram-based gradient boosting (HGB) (accuracy: 100%, time: 6.39 sec). The HGB classifier identified the 11 most important features (LDL, cholesterol, HDL-C, MCHC, triglyceride, amylase, UA, LDH, CK-MB, ALP and MCH) to detect the disease with 100% accuracy. In addition, the importance of single, double and triple combinations of these features in the diagnosis of the disease was discussed. We propose to use these 11 features and their binary combinations as important biomarkers for ML sensors in the diagnosis of the disease, supporting edge computing on Arduino and cloud IoT service.
Machine Learning Approach for Predicting Students Academic Performance and Study Strategies based on their Motivation
Orji, Fidelia A., Vassileva, Julita
This research aims to develop machine learning models for students academic performance and study strategies prediction which could be generalized to all courses in higher education. Key learning attributes (intrinsic, extrinsic, autonomy, relatedness, competence, and self-esteem) essential for students learning process were used in building the models. Determining the broad effect of these attributes on students' academic performance and study strategy is the center of our interest. To investigate this, we used Scikit-learn in python to build five machine learning models (Decision Tree, K-Nearest Neighbour, Random Forest, Linear/Logistic Regression, and Support Vector Machine) for both regression and classification tasks to perform our analysis. The models were trained, evaluated, and tested for accuracy using 924 university dentistry students' data collected by Chilean authors through quantitative research design. A comparative analysis of the models revealed that the tree-based models such as the random forest (with prediction accuracy of 94.9%) and decision tree show the best results compared to the linear, support vector, and k-nearest neighbours. The models built in this research can be used in predicting student performance and study strategy so that appropriate interventions could be implemented to improve student learning progress. Thus, incorporating strategies that could improve diverse student learning attributes in the design of online educational systems may increase the likelihood of students continuing with their learning tasks as required. Moreover, the results show that the attributes could be modelled together and used to adapt/personalize the learning process.