Support Vector Machines
Instance-based entropy fuzzy support vector machine for imbalanced data
Cho, Poongjin, Lee, Minhyuk, Chang, Woojin
Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results towards the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neighbors to determine the fuzzy membership value for each sample which prioritizes the importance of each sample. IEFSVM considers the diversity of entropy patterns for each sample when increasing the size of neighbors, k, while EFSVM uses single entropy information of the fixed size of neighbors for all samples. By varying k, we can reflect the component change of sample's neighbors from near to far distance in the determination of fuzzy value membership. Numerical experiments on 35 public and 12 real-world imbalanced datasets are performed to validate IEFSVM and area under the receiver operating characteristic curve (AUC) is used to compare its performance with other SVMs and machine learning methods. IEFSVM shows a much higher AUC value for datasets with high imbalance ratio, implying that IEFSVM is effective in dealing with the class imbalance problem.
Emotion Recognition from Speech based on Relevant Feature and Majority Voting
Sarker, Md. Kamruzzaman, Alam, Kazi Md. Rokibul, Arifuzzaman, Md.
This paper proposes an approach to detect emotion from human speech employing majority voting technique over several machine learning techniques. The contribution of this work is in two folds: firstly it selects those features of speech which is most promising for classification and secondly it uses the majority voting technique that selects the exact class of emotion. Here, majority voting technique has been applied over Neural Network (NN), Decision Tree (DT), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). Input vector of NN, DT, SVM and KNN consists of various acoustic and prosodic features like Pitch, Mel-Frequency Cepstral coefficients etc. From speech signal many feature have been extracted and only promising features have been selected. To consider a feature as promising, Fast Correlation based feature selection (FCBF) and Fisher score algorithms have been used and only those features are selected which are highly ranked by both of them. The proposed approach has been tested on Berlin dataset of emotional speech [3] and Electromagnetic Articulography (EMA) dataset [4]. The experimental result shows that majority voting technique attains better accuracy over individual machine learning techniques. The employment of the proposed approach can effectively recognize the emotion of human beings in case of social robot, intelligent chat client, call-center of a company etc.
Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings
Ljubešić, Nikola, Fišer, Darja, Peti-Stantić, Anita
The notions of concreteness and imageability, traditionally important in psycholinguistics, are gaining significance in semantic-oriented natural language processing tasks. In this paper we investigate the predictability of these two concepts via supervised learning, using word embeddings as explanatory variables. We perform predictions both within and across languages by exploiting collections of cross-lingual embeddings aligned to a single vector space. We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20% in correlation when predicting across languages. We further show that the cross-lingual transfer via word embeddings is more efficient than the simple transfer via bilingual dictionaries.
VFPred: A Fusion of Signal Processing and Machine Learning techniques in Detecting Ventricular Fibrillation from ECG Signals
Ibtehaz, Nabil, Rahman, M. Saifur, Rahman, M. Sohel
Ventricular Fibrillation (VF), one of the most dangerous arrhythmias, is responsible for sudden cardiac arrests. Thus, various algorithms have been developed to predict VF from Electrocardiogram (ECG), which is a binary classification problem. In the literature, we find a number of algorithms based on signal processing, where, after some robust mathematical operations the decision is given based on a predefined threshold over a single value. On the other hand, some machine learning based algorithms are also reported in the literature; however, these algorithms merely combine some parameters and make a prediction using those as features. Both the approaches have their perks and pitfalls; thus our motivation was to coalesce them to get the best out of the both worlds. Sohel Rahman) Preprint submitted to Pattern Recognition July 10, 2018 a Support Vector Machine for efficient classification. VFPred turns out to be a robust algorithm as it is able to successfully segregate the two classes with equal confidence (Sensitivity 99.99%, Specificity 98.40%) even from a short signal of 5 seconds long, whereas existing works though requires longer signals, flourishes in one but fails in the other. Keywords: Electrocardiogram(ECG), Empirical Mode Decomposition, Heart Arrhythmia, Support Vector Machine, Ventricular Fibrillation(VF). 1. Introduction Ventricular Fibrillation (VF) is a type of cardiac arrhythmia which occurs when the heart quivers instead of pumping due to disturbance in electrical activity in the ventricles [1]. This arrhythmia may result in a cardiac arrest leaving the patient unconscious without any pulse. Ventricular Fibrillation is found initially in about 10% of people in cardiac arrest [2] and sudden cardiac arrest is responsible for approximately 6 million deaths in Europe and in the United States [3]. Therefore, fast and accurate detection of Ventricular Fibrillation can save a lot of lives.
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Wang, Shuaiwen, Zhou, Wenda, Lu, Haihao, Maleki, Arian, Mirrokni, Vahab
Consider the following class of learning schemes: $$\hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbol{\beta}; y_j) + \lambda R(\boldsymbol{\beta}),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbol{\beta}$ denote the unknown weights, and $\lambda$ be a regularization parameter. Finding the optimal choice of $\lambda$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
A Supervised Geometry-Aware Mapping Approach for Classification of Hyperspectral Images
Mohanty, Ramanarayan, Happy, S L, Routray, Aurobinda
The multi-path scattering of light within a pixel [1], bidirectional reflectance distribution [2], and the heterogeneity of sub-pixel constituents [3] are the major concerns in the hyperspectral (HS) data classification. These nonlinearity properties naturally place the HS data on a non-euclidean space. Handling these high dimensional redundant data in a non-euclidean space is one of the major bottlenecks in HS data analysis. Typically, HS classification consists of dimensionality reduction (DR) and subsequent classification operation. The popular DR methods such as principal component analysis (PCA) [4] and linear discriminant analysis (LDA) [5] are linear and operate on Euclidean structures. These linear DR methods skip the curved nonlinear structures of the HS data. On the other hand, manifold learning helps in recovering compact, meaningful low dimensional structures from those complex high dimensional data from a non-euclidean space. The manifold learning methods consider the real world high dimensional data to be generated with a few degrees of freedom [6]. This leads to the projection of the data into lower dimensional space while preserving their underlying geometrical structure [7].
An Unsupervised Learning Classifier with Competitive Error Performance
An unsupervised learning classification model is described. It achieves classification error probability competitive with that of popular supervised learning classifiers such as SVM or kNN. The model is based on the incremental execution of small step shift and rotation operations upon selected discriminative hyperplanes at the arrival of input samples. When applied, in conjunction with a selected feature extractor, to a subset of the ImageNet dataset benchmark, it yields 6.2 % Top 3 probability of error; this exceeds by merely about 2 % the result achieved by (supervised) k-Nearest Neighbor, both using same feature extractor. This result may also be contrasted with popular unsupervised learning schemes such as k-Means which is shown to be practically useless on same dataset.
ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites
Sun, Jiankai, Vishnu, Abhinav, Chakrabarti, Aniket, Siegel, Charles, Parthasarathy, Srinivasan
Noname manuscript No. (will be inserted by the editor) Abstract Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start - a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision@1, Accuracy, MRR) over the state-of-the-art models such as semantic matching by 159.5%,31.84%, Keywords question routing · expert finding · cold-start problem · question answering services 1 Introduction Nowadays, the Community-based question answering sites (CQAs) such as Stack Overflow, Stack Exchange Sites, and Quora, which enable people to post questions and answers in various domains [Yang et al., 2013] have accumulated millions Aniket Chakrabarti Microsoft (work done while at The Ohio State University) Email: chakrabarti.14@osu.edu 2 Jiankai Sun et al. One important task in CQAs is to make recommendations for new questions (routing questions), that fall in three scenarios: 1) find experts. In this paper, we focus on the problem of expert finding [Xu et al., 2012,Zhao et al., 2013,Yang et al., 2013, Fang et al., 2016,Zhao et al., 2016,Zhao et al., 2017], which is to choose the right experts for answering questions posted by users in Stack Exchange, which is a network of question-and-answer (Q&A) websites containing topics in various fields. Each Stack Exchange site covers a specific topic. Usually there are two types of questions in CQAs - resolved (questions with answers) and newly posted questions (questions that have not received any answers).
Supervised vs. Unsupervised Learning – Towards Data Science
Supervised learning is typically done in the context of classification, when we want to map input to output labels, or regression, when we want to map input to a continuous output. Common algorithms in supervised learning include logistic regression, naive bayes, support vector machines, artificial neural networks, and random forests. In both regression and classification, the goal is to find specific relationships or structure in the input data that allow us to effectively produce correct output data. Note that "correct" output is determined entirely from the training data, so while we do have a ground truth that our model will assume is true, it is not to say that data labels are always correct in real-world situations. Noisy, or incorrect, data labels will clearly reduce the effectiveness of your model.
Multi-Merge Budget Maintenance for Stochastic Gradient Descent SVM Training
Qaadan, Sahar, Glasmachers, Tobias
Budgeted Stochastic Gradient Descent (BSGD) is a state-of-the-art technique for training large-scale kernelized support vector machines. The budget constraint is maintained incrementally by merging two points whenever the pre-defined budget is exceeded. The process of finding suitable merge partners is costly; it can account for up to 45% of the total training time. In this paper we investigate computationally more efficient schemes that merge more than two points at once. We obtain significant speed-ups without sacrificing accuracy.