Goto

Collaborating Authors

 Support Vector Machines


Two-stage Best-scored Random Forest for Large-scale Regression

arXiv.org Machine Learning

We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting procedure into two: In stage one, the feature space is partitioned into non-overlapping cells; in stage two, child trees grow separately on these cells. The strengths of this algorithm can be summarized as follows: First of all, the pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible, which resolves the boundary discontinuities long plaguing the existing algorithms. Secondly, the two-stage procedure paves the way for parallel computing, leading to computational efficiency. Last but not least, TBRF can serve as an inclusive framework where different mainstream regression strategies such as linear predictor and least squares support vector machines (LS-SVMs) can also be incorporated as value assignment approaches on leaves of the child trees, depending on the characteristics of the underlying data sets. Numerical assessments on comparisons with other state-of-the-art methods on several large-scale real data sets validate the promising prediction accuracy and high computational efficiency of our algorithm.


Modeling user context for valence prediction from narratives

arXiv.org Machine Learning

Automated prediction of valence, one key feature of a person's emotional state, from individuals' personal narratives may provide crucial information for mental healthcare (e.g. early diagnosis of mental diseases, supervision of disease course, etc.). In the Interspeech 2018 ComParE Self-Assessed Affect challenge, the task of valence prediction was framed as a three-class classification problem using 8 seconds fragments from individuals' narratives. As such, the task did not allow for exploring contextual information of the narratives. In this work, we investigate the intrinsic information from multiple narratives recounted by the same individual in order to predict their current state-of-mind. Furthermore, with generalizability in mind, we decided to focus our experiments exclusively on textual information as the public availability of audio narratives is limited compared to text. Our hypothesis is, that context modeling might provide insights about emotion triggering concepts (e.g. events, people, places) mentioned in the narratives that are linked to an individual's state of mind. We explore multiple machine learning techniques to model narratives. We find that the models are able to capture inter-individual differences, leading to more accurate predictions of an individual's emotional state, as compared to single narratives.


Maximal Margin Distribution Support Vector Regression with coupled Constraints-based Convex Optimization

arXiv.org Machine Learning

Support vector regression (SVR) is one of the most popular machine learning algorithms aiming to generate the optimal regression curve through maximizing the minimal margin of selected training samples, i.e., support vectors. Recent researchers reveal that maximizing the margin distribution of whole training dataset rather than the minimal margin of a few support vectors, is prone to achieve better generalization performance. However, the margin distribution support vector regression machines suffer difficulties resulted from solving a non-convex quadratic optimization, compared to the margin distribution strategy for support vector classification, This paper firstly proposes a maximal margin distribution model for SVR(MMD-SVR), then implementing coupled constrain factor to convert the non-convex quadratic optimization to a convex problem with linear constrains, which enhance the training feasibility and efficiency for SVR to derived from maximizing the margin distribution. The theoretical and empirical analysis illustrates the superiority of MMD-SVR. In addition, numerical experiments show that MMD-SVR could significantly improve the accuracy of prediction and generate more smooth regression curve with better generalization compared with the classic SVR.


On Transfer Learning For Chatter Detection in Turning Using Wavelet Packet Transform and Empirical Mode Decomposition

arXiv.org Machine Learning

The increasing availability of sensor data at machine tools makes automatic chatter detection algorithms a trending topic in metal cutting. Two prominent and advanced methods for feature extraction via signal decomposition are Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We apply these two methods to time series acquired from an acceleration sensor at the tool holder of a lathe. Different turning experiments with varying dynamic behavior of the machine tool structure were performed. We compare the performance of these two methods with Support Vector Machine (SVM) classifier combined with Recursive Feature Elimination (RFE). We also show that the common WPT-based approach of choosing wavelet packets with the highest energy ratios as representative features for chatter does not always result in packets that enclose the chatter frequency, thus reducing the classification accuracy. Further, we test the transfer learning capability of each of these methods by training the classifier on one of the cutting configurations and then testing it on the other cases. It is found that when training and testing on data from the same cutting configuration both methods yield high accuracies reaching in one of the cases as high as 94% and 91%, respectively, for WPT and EEMD. However, EEMD is shown to outperform WPT in transfer learning applications with accuracy of up to 84%. Therefore, for systems where the movement of the cutting center leads to significant variations in the stiffness of the machine-tool system, we recommend using EEMD over WPT for training a classifier. This is because EEMD retains higher accuracy rates in comparison to WPT when the input data stream deviates from the data that was used to train the classifier.


LS-SVR as a Bayesian RBF network

arXiv.org Machine Learning

Statistical learning theory has been studied for general function estimation from data since the late 1960's [22]. However, it was only widely adopted in practice after the introduction of the learning algorithms known as Support Vector Machines (SVMs) [23]. Using the so-called kernel trick, which replaces dot products between features and model parameters by evaluations of a kernel function, SVMs can learn nonlinear relations from training patterns by solving a convex optimization problem [16]. An important variant of the SVM is the Least Squares Support Vector Machine (LS-SVM) [20], which is obtained by making all data points supportvectors. LS-SVM avoids the constrained quadratic optimization step of standard SVMs by replacing the training procedure with one that reduces to solving a system of linear equations, which can be performed via ordinary least squares. The first SVM formulation was derived for classification tasks, but it has been readily adapted to tackle regression problems, being usually named Support Vector Regression (SVR) [6]. Similarly, the regression counterpart of LS-SVM is the LS-SVR [20]. 1


High-Performance Support Vector Machines and Its Applications

arXiv.org Machine Learning

The support vector machines (SVM) algorithm is a popular classification technique in data mining and machine learning. In this paper, we propose a distributed SVM algorithm and demonstrate its use in a number of applications. The algorithm is named high-performance support vector machines (HPSVM). The major contribution of HPSVM is two-fold. First, HPSVM provides a new way to distribute computations to the machines in the cloud without shuffling the data. Second, HPSVM minimizes the inter-machine communications in order to maximize the performance. We apply HPSVM to some real-world classification problems and compare it with the state-of-the-art SVM technique implemented in R on several public data sets. HPSVM achieves similar or better results.


Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents

arXiv.org Machine Learning

In this paper, we have proposed a brain signal classification method, which uses eigenvalues of the covariance matrix as features to classify images (topomaps) created from the brain signals. The signals are recorded during the answering of 2D and 3D questions. The system is used to classify the correct and incorrect answers for both 2D and 3D questions. Using the classification technique, the impacts of 2D and 3D multimedia educational contents on learning, memory retention and recall will be compared. The subjects learn similar 2D and 3D educational contents. Afterwards, subjects are asked 20 multiple-choice questions (MCQs) associated with the contents after thirty minutes (Short-Term Memory) and two months (Long-Term Memory). Eigenvalues features extracted from topomaps images are given to K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers, in order to identify the states of the brain related to incorrect and correct answers. Excellent accuracies obtained by both classifiers and by applying statistical analysis on the results, no significant difference is indicated between 2D and 3D multimedia educational contents on learning, memory retention and recall in both STM and LTM.


Fair Classification and Social Welfare

arXiv.org Artificial Intelligence

Now that machine learning algorithms lie at the center of many resource allocation pipelines, computer scientists have been unwittingly cast as partial social planners. Given this state of affairs, important questions follow. What is the relationship between fairness as defined by computer scientists and notions of social welfare? In this paper, we present a welfare-based analysis of classification and fairness regimes. We translate a loss minimization program into a social welfare maximization problem with a set of implied welfare weights on individuals and groups--weights that can be analyzed from a distribution justice lens. In the converse direction, we ask what the space of possible labelings is for a given dataset and hypothesis class. We provide an algorithm that answers this question with respect to linear hyperplanes in $\mathbb{R}^d$ that runs in $O(n^dd)$. Our main findings on the relationship between fairness criteria and welfare center on sensitivity analyses of fairness-constrained empirical risk minimization programs. We characterize the ranges of $\Delta \epsilon$ perturbations to a fairness parameter $\epsilon$ that yield better, worse, and neutral outcomes in utility for individuals and by extension, groups. We show that applying more strict fairness criteria that are codified as parity constraints, can worsen welfare outcomes for both groups. More generally, always preferring "more fair" classifiers does not abide by the Pareto Principle---a fundamental axiom of social choice theory and welfare economics. Recent work in machine learning has rallied around these notions of fairness as critical to ensuring that algorithmic systems do not have disparate negative impact on disadvantaged social groups. By showing that these constraints often fail to translate into improved outcomes for these groups, we cast doubt on their effectiveness as a means to ensure justice.


Support Vector Regression via a Combined Reward Cum Penalty Loss Function

arXiv.org Machine Learning

In this paper, we introduce a novel combined reward cum penalty loss function to handle the regression problem. The proposed combined reward cum penalty loss function penalizes the data points which lie outside the $\epsilon$-tube of the regressor and also assigns reward for the data points which lie inside of the $\epsilon$-tube of the regressor. The combined reward cum penalty loss function based regression (RP-$\epsilon$-SVR) model has several interesting properties which are investigated in this paper and are also supported with the experimental results.


DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning

arXiv.org Machine Learning

Serial crystallography is the field of science that studies the structure and properties of crystals via diffraction patterns. In this paper, we introduce a new serial crystallography dataset comprised of real and synthetic images; the synthetic images are generated through the use of a simulator that is both scalable and accurate. The resulting dataset is called DiffraNet, and it is composed of 25,457 512x512 grayscale labeled images. We explore several computer vision approaches for classification on DiffraNet such as standard feature extraction algorithms associated with Random Forests and Support Vector Machines but also an end-to-end CNN topology dubbed DeepFreak tailored to work on this new dataset. All implementations are publicly available and have been fine-tuned using off-the-shelf AutoML optimization tools for a fair comparison. Our best model achieves 98.5% accuracy on synthetic images and 94.51% accuracy on real images. We believe that the DiffraNet dataset and its classification methods will have in the long term a positive impact in accelerating discoveries in many disciplines, including chemistry, geology, biology, materials science, metallurgy, and physics.