Support Vector Machines
Evaluating All Possible Combinations of Hyperparameters -Grid Search-
The model and the preprocessing are individual for each project. Hyperparameters are tuned according to the dataset and using the same hyperparameters for each project compromises the accuracy of the results. For example, there are different hyperparameters such as'solver', 'C', 'penalty' in the Logistic Regression algorithm, and different combinations of these give different results. Similarly, there are adjustable parameters for Support Vector Machine such as gamma value, C value, and combination of them also gives different results. These hyperparameters of the algorithms are available on the sklearn website.
Cough Detection Using Selected Informative Features from Audio Signals
Chen, Xinru, Hu, Menghan, Zhai, Guangtao
Cough is a common symptom of respiratory and lung diseases. Cough detection is important to prevent, assess and control epidemic, such as COVID-19. This paper proposes a model to detect cough events from cough audio signals. The models are trained by the dataset combined ESC-50 dataset with self-recorded cough recordings. The test dataset contains inpatient cough recordings collected from inpatients of the respiratory disease department in Ruijin Hospital. We totally build 15 cough detection models based on different feature numbers selected by Random Frog, Uninformative Variable Elimination (UVE), and Variable influence on projection (VIP) algorithms respectively. The optimal model is based on 20 features selected from Mel Frequency Cepstral Coefficients (MFCC) features by UVE algorithm and classified with Support Vector Machine (SVM) linear two-class classifier. The best cough detection model realizes the accuracy, recall, precision and F1-score with 94.9%, 97.1%, 93.1% and 0.95 respectively. Its excellent performance with fewer dimensionality of the feature vector shows the potential of being applied to mobile devices, such as smartphones, thus making cough detection remote and non-contact.
Analysis of Driving Scenario Trajectories with Active Learning
Jarl, Sanna, Rahrovani, Sadegh, Chehreghani, Morteza Haghir
Annotating the driving scenario trajectories based only on explicit rules (i.e., knowledge-based methods) can be subject to errors, such as false positive/negative classification of scenarios that lie on the border of two scenario classes, missing unknown scenario classes, and also anomalies. On the other side, verifying the labels by the annotators is not cost-efficient. For this purpose, active learning (AL) could potentially improve the annotation procedure by inclusion of an annotator/expert in an efficient way. In this study, we develop an active learning framework to annotate driving trajectory time-series data. At the first step, we compute an embedding of the time-series trajectories into a latent space in order to extract the temporal nature. For this purpose, we study three different latent space representations: multivariate Time Series t-Distributed Stochastic Neighbor Embedding (mTSNE), Recurrent Auto-Encoder (RAE) and Variational Recurrent Auto-Encoder (VRAE). We then apply different active learning paradigms with different classification models to the embedded data. In particular, we study the two classifiers Neural Network (NN) and Support Vector Machines (SVM), with three active learning query strategies (i.e., entropy, margin and random). In the following, we explore the possibilities of the framework to discover unknown classes and demonstrate how it can be used to identify the out-of-class trajectories.
Fake News and Phishing Detection Using a Machine Learning Trained Expert System
Fitzpatrick, Benjamin, Liang, Xinyu "Sherwin", Straub, Jeremy
Expert systems have been used to enable computers to make recommendations and decisions. This paper presents the use of a machine learning trained expert system (MLES) for phishing site detection and fake news detection. Both topics share a similar goal: to design a rule-fact network that allows a computer to make explainable decisions like domain experts in each respective area. The phishing website detection study uses a MLES to detect potential phishing websites by analyzing site properties (like URL length and expiration time). The fake news detection study uses a MLES rule-fact network to gauge news story truthfulness based on factors such as emotion, the speaker's political affiliation status, and job. The two studies use different MLES network implementations, which are presented and compared herein. The fake news study utilized a more linear design while the phishing project utilized a more complex connection structure. Both networks' inputs are based on commonly available data sets.
An Effective Leaf Recognition Using Convolutional Neural Networks Based Features
Quach, Boi M., Cuong, Dinh V., Pham, Nhung, Huynh, Dang, Nguyen, Binh T.
There is a warning light for the loss of plant habitats worldwide that entails concerted efforts to conserve plant biodiversity. Thus, plant species classification is of crucial importance to address this environmental challenge. In recent years, there is a considerable increase in the number of studies related to plant taxonomy. While some researchers try to improve their recognition performance using novel approaches, others concentrate on computational optimization for their framework. In addition, a few studies are diving into feature extraction to gain significantly in terms of accuracy. In this paper, we propose an effective method for the leaf recognition problem. In our proposed approach, a leaf goes through some pre-processing to extract its refined color image, vein image, xy-projection histogram, handcrafted shape, texture features, and Fourier descriptors. These attributes are then transformed into a better representation by neural network-based encoders before a support vector machine (SVM) model is utilized to classify different leaves. Overall, our approach performs a state-of-the-art result on the Flavia leaf dataset, achieving the accuracy of 99.58\% on test sets under random 10-fold cross-validation and bypassing the previous methods. We also release our codes\footnote{Scripts are available at \url{https://github.com/dinhvietcuong1996/LeafRecognition}} for contributing to the research community in the leaf classification problem.
Electrical peak demand forecasting- A review
Dai, Shuang, Meng, Fanlin, Dai, Hongsheng, Wang, Qian, Chen, Xizhong
The power system is undergoing rapid evolution with the roll-out of advanced metering infrastructure and local energy applications (e.g. electric vehicles) as well as the increasing penetration of intermittent renewable energy at both transmission and distribution level, which characterizes the peak load demand with stronger randomness and less predictability and therefore poses a threat to the power grid security. Since storing large quantities of electricity to satisfy load demand is neither economically nor environmentally friendly, effective peak demand management strategies and reliable peak load forecast methods become essential for optimizing the power system operations. To this end, this paper provides a timely and comprehensive overview of peak load demand forecast methods in the literature. To our best knowledge, this is the first comprehensive review on such topic. In this paper we first give a precise and unified problem definition of peak load demand forecast. Second, 139 papers on peak load forecast methods were systematically reviewed where methods were classified into different stages based on the timeline. Thirdly, a comparative analysis of peak load forecast methods are summarized and different optimizing methods to improve the forecast performance are discussed. The paper ends with a comprehensive summary of the reviewed papers and a discussion of potential future research directions.
Combining Machine Learning Classifiers for Stock Trading with Effective Feature Extraction
The prevalence of volatility in the stock market, makes predicting stock prices anything but simple. Before investing, investors perform two kinds of analysis [patel2015predicting]. The first of this is fundamental analysis, where investors look into the value of stocks, the industry performance, economical factors, etc. and decide whether or not to invest. Technical analysis is the second, more advanced, analysis which involves evaluating those stocks through the use of statistics and activity in the current market, such as volume traded and previous price levels [patel2015predicting]. Technical analysts use charts to recognise patterns and try to predict how a stock price will change. Malkiel and Fama's Efficient market hypothesis states that predicting values of stocks considering financial information is possible, because the prices are informationally efficient [malkiel1970efficient].
Character Spotting Using Machine Learning Techniques
Preethi, P, Viswanath, Hrishikesh
This work presents a comparison of machine learning algorithms that are implemented to segment the characters of text presented as an image. The algorithms are designed to work on degraded documents with text that is not aligned in an organized fashion. The paper investigates the use of Support Vector Machines, K-Nearest Neighbor algorithm and an Encoder Network to perform the operation of character spotting. Character Spotting involves extracting potential characters from a stream of text by selecting regions bound by white space.
A Survey on Data-driven Software Vulnerability Assessment and Prioritization
Le, Triet H. M., Chen, Huaming, Babar, M. Ali
Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surge in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues.
Support Vector Machines -- the basics
The important job that SVM's perform is to find a decision boundary to classify our data. This decision boundary is also called the hyperplane. Lets start with an example to explain it. Visually, if you look at figure 1, you will see that it makes sense for purple line to be a better hyperplane than the black line. The black line will also do the job, but skates a little to close to one of the red points to make it a good decision line.