Support Vector Machines
Forecasting Stock Market with Support Vector Regression and Butterfly Optimization Algorithm
Ghanbari, Mohammadreza, Arian, Hamidreza
The problem of forecasting stock price movements, due to market's uncertainty from incoming news, nonlinear financial instruments and behavioral and emotional biases is a challenging task facing academics and practitioners in the field; perhaps by far more complex than predicting the course of a comet by a physicist. In the past, many models have been proposed to face this problem including Support vector regression (SVR) as the extended routine designed from Support Vector Machines (SVM). Originally introduced by Vapnik for classification problems, SVM was redesigned to solve regression problems in the SVR framework. Nevertheless, SVM can solve small-sample, nonlinear and high dimension problems by using the structural risk minimization principle instead of the empirical risk principle, which could theoretically guarantee to achieve the global optimum [9]. Although SVR experimental results have shown great performance compared to other nonlinear methods [39, 40], its performance mainly depends on the choice of parameters.
Topological Feature Vectors for Chatter Detection in Turning Processes
Yesilli, Melih C., Khasawneh, Firas A., Otto, Andreas
Machining processes are most accurately described using complex dynamical systems that include nonlinearities, time delays and stochastic effects. Due to the nature of these models as well as the practical challenges which include time-varying parameters, the transition from numerical/analytical modeling of machining to the analysis of real cutting signals remains challenging. Some studies have focused on studying the time series of cutting processes using machine learning algorithms with the goal of identifying and predicting undesirable vibrations during machining referred to as chatter. These tools typically decompose the signal using Wavelet Packet Transforms (WPT) or Ensemble Empirical Mode Decomposition (EEMD). However, these methods require a significant overhead in identifying the feature vectors before a classifier can be trained. In this study, we present an alternative approach based on featurizing the time series of the cutting process using its topological features. We utilize support vector machine classifier combined with feature vectors derived from persistence diagrams, a tool from persistent homology, to encode distinguishing characteristics based on embedding the time series as a point cloud using Takens embedding. We present the results for several choices of the topological feature vectors, and we compare our results to the WPT and EEMD methods using experimental time series from a turning cutting test. Our results show that in most cases combining the TDA-based features with a simple Support Vector Machine (SVM) yields accuracies that either exceed or are within the error bounds of their WPT and EEMD counterparts.
A Hybrid Algorithm for Metaheuristic Optimization
Khanna, Sujit Pramod, Ororbia, Alexander II
We propose a novel, flexible algorithm for combining together metaheuristic optimizers for non-convex optimization problems. Our approach treats the constituent optimizers as a team of complex agents that communicate information amongst each other at various intervals during the simulation process. The information produced by each individual agent can be combined in various ways via higher-level operators. In our experiments on key benchmark functions, we investigate how the performance of our algorithm varies with respect to several of its key modifiable properties. Finally, we apply our proposed algorithm to classification problems involving the optimization of support-vector machine classifiers.
Inside a Data Scientist's ToolBox: Top 9 Data Science Algorithms - DataFlair
In a Data Science interview, the interviewer asked me, how would you explain top data science algorithms to a non-tech person. I told him that Data science isโฆ..(read the article to know:D). The explanation is too simple that you can easily understand. We will discuss mostly machine learning algorithms that are important for data scientists and classify them based on supervised and unsupervised roles. I will provide you an outline for all the important algorithms that you can deploy for improving your data science operations. Here is the list of top Data Science Algorithms that you must know to become a data scientist.
New methods for SVM feature selection
Aladjidi, Tangui, Pasqualini, Franรงois
Support Vector Machines have been a popular topic for quite some time now, and as they develop, a need for new methods of feature selection arises. This work presents various approaches SVM feature selection developped during the author's summer internship at STMicroelectronics. The work focuses on the use of one-class SVM's for wafer testing. A key problem in OC-SVM algorithms is dimensionality reduction, otherwise known as feature selection. Prior to the execution of the proper SVM part of the algorithm, the program first needs to asses what will be the most significant data to take into account.
Predicting failures of Molteno and Baerveldt glaucoma drainage devices using machine learning models
The purpose of this retrospective study is to measure machine learning models' ability to predict glaucoma drainage device failure based on demographic information and preoperative measurements. The medical records of sixty-two patients were used. Potential predictors included the patient's race, age, sex, preoperative intraocular pressure, preoperative visual acuity, number of intraocular pressure-lowering medications, and number and type of previous ophthalmic surgeries. Failure was defined as final intraocular pressure greater than 18 mm Hg, reduction in intraocular pressure less than 20% from baseline, or need for reoperation unrelated to normal implant maintenance. Five classifiers were compared: logistic regression, artificial neural network, random forest, decision tree, and support vector machine.
A Futuristic Reality: Harnessing The Power Of The Three Layers Of Machine Learning
Google Assistant can draw on voice command, as seen here at the Google I/O conference in 2018, with the help of machine learning techniques. Artificial intelligence systems powered by machine learning have been creating headlines with applications as varied as making restaurant reservations by phone, sorting cucumbers, and distinguishing chihuahuas from muffins. Media buzz aside, many fast-growing startups are taking advantage of machine learning (ML) techniques like neural networks and support vector machines to learn from data, make predictions, improve products, and enhance business decisions. Unfortunately "machine learning theater" โ companies pretending to use the technology to make theirs seem more sophisticated for a higher valuation โ is also on the rise. Undeniably, ML is transforming businesses and industries, with some more likely to benefit than others.
Augmenting Physiological Time Series Data: A Case Study for Sleep Apnea Detection
Nikolaidis, Konstantinos, Kristiansen, Stein, Goebel, Vera, Plagemann, Thomas, Liestรธl, Knut, Kankanhalli, Mohan
The quantity of labelled data is small due to privacy concerns and the cost of data acquisition and labelling by a medical expert. Furthermore, it is quite common that collected data are unbalanced and getting enough data to personalize models for individuals is very expensive or even infeasible. This paper addresses these problems by (1) designing a recurrent Generative Adversarial Network to generate realistic synthetic data and to augment the original dataset, (2) enabling the generation of balanced datasets based on heavily unbalanced dataset, and (3) to control the data generation in such a way that the generated data resembles data from specific individuals. We apply these solutions for sleep apnea detection and study in the evaluation the performance of four well-known techniques, i.e., K-Nearest Neighbour, Random Forest, Multi-Layer Perceptron, and Support Vector Machine.
Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets
Kath, Christopher, Ziel, Florian
We discuss a concept denoted as Conformal Prediction (CP) in this paper. While initially stemming from the world of machine learning, it was never applied or analyzed in the context of short-term electricity price forecasting. Therefore, we elaborate the aspects that render Conformal Prediction worthwhile to know and explain why its simple yet very efficient idea has worked in other fields of application and why its characteristics are promising for short-term power applications as well. We compare its performance with different state-of-the-art electricity price forecasting models such as quantile regression averaging (QRA) in an empirical out-of-sample study for three short-term electricity time series. We combine Conformal Prediction with various underlying point forecast models to demonstrate its versatility and behavior under changing conditions. Our findings suggest that Conformal Prediction yields sharp and reliable prediction intervals in short-term power markets. We further inspect the effect each of Conformal Prediction's model components has and provide a path-based guideline on how to find the best CP model for each market.
Better Technical Debt Detection via SURVEYing
Fahid, Fahmid M., Yu, Zhe, Menzies, Tim
Software analytics can be improved by surveying; i.e. rechecking and (possibly) revising the labels offered by prior analysis. Surveying is a time-consuming task and effective surveyors must carefully manage their time. Specifically, they must balance the cost of further surveying against the additional benefits of that extra effort. This paper proposes SURVEY0, an incremental Logistic Regression estimation method that implements cost/benefit analysis. Some classifier is used to rank the as-yet-unvisited examples according to how interesting they might be. Humans then review the most interesting examples, after which their feedback is used to update an estimator for estimating how many examples are remaining. This paper evaluates SURVEY0 in the context of self-admitted technical debt. As software project mature, they can accumulate "technical debt" i.e. developer decisions which are sub-optimal and decrease the overall quality of the code. Such decisions are often commented on by programmers in the code; i.e. it is self-admitted technical debt (SATD). Recent results show that text classifiers can automatically detect such debt. We find that we can significantly outperform prior results by SURVEYing the data. Specifically, for ten open-source JAVA projects, we can find 83% of the technical debt via SURVEY0 using just 16% of the comments (and if higher levels of recall are required, SURVEY0can adjust towards that with some additional effort).