Goto

Collaborating Authors

 Support Vector Machines


Short-Term Stock Price Forecasting using exogenous variables and Machine Learning Algorithms

arXiv.org Artificial Intelligence

Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, Random Forest, Multi-layer Perceptron, and Support Vector Regression models. We report the models that produce the highest accuracies from our evaluation metrics: RMSE, MAPE, MTT, and MPE. Using a training data set of 240 trading days, we find that XGBoost gives the highest accuracy despite running longer (up to 10 seconds). Results from this study may improve by further tuning the individual parameters or introducing more exogenous variables.


Incremental Outlier Detection Modelling Using Streaming Analytics in Finance & Health Care

arXiv.org Artificial Intelligence

In this paper, we had built the online model which are built incrementally by using online outlier detection algorithms under the streaming environment. We identified that there is highly necessity to have the streaming models to tackle the streaming data. The objective of this project is to study and analyze the importance of streaming models which is applicable in the real-world environment. In this work, we built various Outlier Detection (OD) algorithms viz., One class Support Vector Machine (OC-SVM), Isolation Forest Adaptive Sliding window approach (IForest ASD), Exact Storm, Angle based outlier detection (ABOD), Local outlier factor (LOF), KitNet, KNN ASD methods. The effectiveness and validity of the above-built models on various finance problems such as credit card fraud detection, churn prediction, ethereum fraud prediction. Further, we also analyzed the performance of the models on the health care prediction problems such as heart stroke prediction, diabetes prediction and heart stroke prediction problems. As per the results and dataset it shows that it performs well for the highly imbalanced datasets that means there is a majority of negative class and minority will be the positive class. Among all the models, the ensemble model strategy IForest ASD model performed better in most of the cases standing in the top 3 models in almost all of the cases.


Lp- and Risk Consistency of Localized SVMs

arXiv.org Artificial Intelligence

Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regions of the input space. In this paper, localized SVMs are analyzed with regards to their consistency. It is proven that they inherit $L_p$- as well as risk consistency from global SVMs under very weak conditions and even if the regions underlying the localized SVMs are allowed to change as the size of the training data set increases.


A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

arXiv.org Artificial Intelligence

In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.


Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

arXiv.org Artificial Intelligence

This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for support vector clustering. We conducted extensive experiments using real-world data sets and obtained very promising results. The proposed method allows us to achieve 100X and 115X speedups over the state of the art SVC method on the Pendigits and USPS data sets, respectively, while achieving even better clustering quality. To the best of our knowledge, this represents the first practical method for high-quality and fast SVC on large-scale real-world data sets


Using EEG Signals to Assess Workload during Memory Retrieval in a Real-world Scenario

arXiv.org Artificial Intelligence

Objective: The Electroencephalogram (EEG) is gaining popularity as a physiological measure for neuroergonomics in human factor studies because it is objective, less prone to bias, and capable of assessing the dynamics of cognitive states. This study investigated the associations between memory workload and EEG during participants' typical office tasks on a single-monitor and dual-monitor arrangement. We expect a higher memory workload for the single-monitor arrangement. Approach: We designed an experiment that mimics the scenario of a subject performing some office work and examined whether the subjects experienced various levels of memory workload in two different office setups: 1) a single-monitor setup and 2) a dual-monitor setup. We used EEG band power, mutual information, and coherence as features to train machine learning models to classify high versus low memory workload states. Main results: The study results showed that these characteristics exhibited significant differences that were consistent across all participants. We also verified the robustness and consistency of these EEG signatures in a different data set collected during a Sternberg task in a prior study. Significance: The study found the EEG correlates of memory workload across individuals, demonstrating the effectiveness of using EEG analysis in conducting real-world neuroergonomic studies.


Implications of Deep Circuits in Improving Quality of Quantum Question Answering

arXiv.org Artificial Intelligence

Question Answering (QA) has proved to be an arduous challenge in the area of natural language processing (NLP) and artificial intelligence (AI). Many attempts have been made to develop complete solutions for QA as well as improving significant sub-modules of the QA systems to improve the overall performance through the course of time. Questions are the most important piece of QA, because knowing the question is equivalent to knowing what counts as an answer (Harrah in Philos Sci, 1961 [1]). In this work, we have attempted to understand questions in a better way by using Quantum Machine Learning (QML). The properties of Quantum Computing (QC) have enabled classically intractable data processing. So, in this paper, we have performed question classification on questions from two classes of SelQA (Selection-based Question Answering) dataset using quantum-based classifier algorithms-quantum support vector machine (QSVM) and variational quantum classifier (VQC) from Qiskit (Quantum Information Science toolKIT) for Python. We perform classification with both classifiers in almost similar environments and study the effects of circuit depths while comparing the results of both classifiers. We also use these classification results with our own rule-based QA system and observe significant performance improvement. Hence, this experiment has helped in improving the quality of QA in general.


Enhancing Petrophysical Studies with Machine Learning: A Field Case Study on Permeability Prediction in Heterogeneous Reservoirs

arXiv.org Artificial Intelligence

This field case study aims to address the challenge of accurately predicting petrophysical properties in heterogeneous reservoir formations, which can significantly impact reservoir performance predictions. The study employed three machine learning algorithms, namely Artificial Neural Network (ANN), Random Forest Classifier (RFC), and Support Vector Machine (SVM), to predict permeability log from conventional logs and match it with core data. The primary objective of this study was to compare the effectiveness of the three machine learning algorithms in predicting permeability and determine the optimal prediction method. The study utilized the Flow Zone Indicator (FZI) rock typing technique to understand the factors influencing reservoir quality. The findings will be used to improve reservoir simulation and locate future wells more accurately. The study concluded that the FZI approach and machine learning algorithms are effective in predicting permeability log and improving reservoir performance predictions.


Enhancing Quantum Support Vector Machines through Variational Kernel Training

arXiv.org Artificial Intelligence

Quantum machine learning (QML) has witnessed immense progress recently, with quantum support vector machines (QSVMs) emerging as a promising model. This paper focuses on the two existing QSVM methods: quantum kernel SVM (QK-SVM) and quantum variational SVM (QV-SVM). While both have yielded impressive results, we present a novel approach that synergizes the strengths of QK-SVM and QV-SVM to enhance accuracy. Our proposed model, quantum variational kernel SVM (QVK-SVM), leverages the quantum kernel and quantum variational algorithm. We conducted extensive experiments on the Iris dataset and observed that QVK-SVM outperforms both existing models in terms of accuracy, loss, and confusion matrix indicators. Our results demonstrate that QVK-SVM holds tremendous potential as a reliable and transformative tool for QML applications. Hence, we recommend its adoption in future QML research endeavors.


Kernel Subspace and Feature Extraction

arXiv.org Artificial Intelligence

We study kernel methods in machine learning from the perspective of feature subspace. We establish a one-to-one correspondence between feature subspaces and kernels and propose an information-theoretic measure for kernels. In particular, we construct a kernel from Hirschfeld--Gebelein--R\'{e}nyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality. We use the support vector machine (SVM) as an example to illustrate a connection between kernel methods and feature extraction approaches. We show that the kernel SVM on maximal correlation kernel achieves minimum prediction error. Finally, we interpret the Fisher kernel as a special maximal correlation kernel and establish its optimality.