Goto

Collaborating Authors

 Support Vector Machines


Towards Learning Representations of Binary Executable Files for Security Tasks

arXiv.org Machine Learning

Tackling binary analysis problems has traditionally implied manually defining rules and heuristics. As an alternative, we are suggesting using machine learning models for learning distributed representations of binaries that can be applicable for a number of downstream tasks. We construct a computational graph from the binary executable and use it with a graph convolutional neural network to learn a high dimensional representation of the program. We show the versatility of this approach by using our representations to solve two semantically different binary analysis tasks -- algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results and demonstrate improvement on the state of the art methods for both tasks.


Wasserstein Exponential Kernels

arXiv.org Machine Learning

In the context of kernel methods, the similarity between data points is encoded by the kernel function which is often defined thanks to the Euclidean distance, a common example being the squared exponential kernel. Recently, other distances relying on optimal transport theory - such as the Wasserstein distance between probability distributions - have shown their practical relevance for different machine learning techniques. In this paper, we study the use of exponential kernels defined thanks to the regularized Wasserstein distance and discuss their positive definiteness. More specifically, we define Wasserstein feature maps and illustrate their interest for supervised learning problems involving shapes and images. Empirically, Wasserstein squared exponential kernels are shown to yield smaller classification errors on small training sets of shapes, compared to analogous classifiers using Euclidean distances.


Improved Subsampled Randomized Hadamard Transform for Linear SVM

arXiv.org Machine Learning

Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a $d$-dimensional data into $r$-dimensional space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.


Machine Learning Course with SAS, Free Trial

#artificialintelligence

Get seven free days to experience our Machine Learning With SAS Viya course. Learn the theoretical foundation for different techniques associated with supervised machine learning models. You'll develop a series of supervised learning models, including decision tree, ensemble of trees (forest and gradient boosting), neural networks and support vector machines.


A Hybrid Two-layer Feature Selection Method Using GeneticAlgorithm and Elastic Net

arXiv.org Machine Learning

Feature selection, as a critical pre-processing step for machine learning, aims at determining representative predictors from a high-dimensional feature space dataset to improve the prediction accuracy. However, the increase in feature space dimensionality, comparing to the number of observations, poses a severe challenge to many existing feature selection methods with respect to computational efficiency and prediction performance. This paper presents a new hybrid two-layer feature selection approach that combines a wrapper and an embedded method in constructing an appropriate subset of predictors. In the first layer of the proposed method, the Genetic Algorithm(GA) has been adopted as a wrapper to search for the optimal subset of predictors, which aims to reduce the number of predictors and the prediction error. As one of the meta-heuristic approaches, GA is selected due to its computational efficiency; however, GAs do not guarantee the optimality. To address this issue, a second layer is added to the proposed method to eliminate any remaining redundant/irrelevant predictors to improve the prediction accuracy. Elastic Net(EN) has been selected as the embedded method in the second layer because of its flexibility in adjusting the penalty terms in regularization process and time efficiency. This hybrid two-layer approach has been applied on a Maize genetic dataset from NAM population, which consists of multiple subsets of datasets with different ratio of the number of predictors to the number of observations. The numerical results confirm the superiority of the proposed model.


Fast quantum learning with statistical guarantees

arXiv.org Machine Learning

A wide class of quantum algorithms for learning problems exp loit fast quantum linear algebra subroutines to achieve runtimes that are exponentially faster than their classical counterparts [ Cil 18 ]. Examples of these algorithms are quantum support vector m achines [ RML14 ], quantum linear regression [ WBL12; SSP16 ], and quantum least squares [ KP17; CGJ18 ]. A careful analysis of these algorithms identified a number of caveats that limit their practical applicability such as the need for a strong form of quantum ac cess to the input data, restrictions on structural properties of the data matrix (such as conditi on number or sparsity), and modes of access to the output [ Aar15 ]. Furthermore, if one assumes that it is efficient to (classic ally) sample elements of the training data in a way proportional to their norm, then it is possible to show that classical algorithms are only polynomially slowe r (albeit the scaling of the quantum algorithms can be considerably better) [ Tan18; CL W18; Chi 19a; GLT18; Chi 19b ]. In this work we continue to investigate the limitations of qu antum algorithms for learning problems.


OPFython: A Python-Inspired Optimum-Path Forest Classifier

arXiv.org Machine Learning

Machine learning techniques have been paramount throughout the last years, being applied in a wide range of tasks, such as classification, object recognition, person identification, image segmentation, among others. Nevertheless, conventional classification algorithms, e.g., Logistic Regression, Decision Trees, Bayesian classifiers, might lack complexity and diversity, not being suitable when dealing with real-world data. A recent graph-inspired classifier, known as the Optimum-Path Forest, has proven to be a state-of-the-art technique, comparable to Support Vector Machines and even surpassing it in some tasks. In this paper, we propose a Python-based Optimum-Path Forest framework, denoted as OPFython, where all of its functions and classes are based upon the original C language implementation. Additionally, as OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.


Improving Generalizability of Fake News Detection Methods using Propensity Score Matching

arXiv.org Machine Learning

Recently, due to the booming influence of online social networks, detecting fake news is drawing significant attention from both academic communities and general public. In this paper, we consider the existence of confounding variables in the features of fake news and use Propensity Score Matching (PSM) to select generalizable features in order to reduce the effects of the confounding variables. Experimental results show that the generalizability of fake news method is significantly better by using PSM than using raw frequency to select features. We investigate multiple types of fake news methods (classifiers) such as logistic regression, random forests, and support vector machines. We have consistent observations of performance improvement.


Supervised Learning for Non-Sequential Data with the Canonical Polyadic Decomposition

arXiv.org Machine Learning

There has recently been increasing interest, both theoretical and practical, in utilizing tensor networks for the analysis and design of machine learning systems. In particular, a framework has been proposed that can handle both dense data (e.g., standard regression or classification tasks) and sparse data (e.g., recommender systems), unlike support vector machines and traditional deep learning techniques. Namely, it can be interpreted as applying local feature mappings to the data and, through the outer product operator, modelling all interactions of functions of the features; the corresponding weights are represented as a tensor network for computational tractability. In this paper, we derive efficient prediction and learning algorithms for supervised learning with the Canonical Polyadic (CP) decomposition, including suitable regularization and initialization schemes. We empirically demonstrate that the CP-based model performs at least on par with the existing models based on the Tensor Train (TT) decomposition on standard non-sequential tasks, and better on MovieLens 100K. Furthermore, in contrast to previous works which applied two-dimensional local feature maps to the data, we generalize the framework to handle arbitrarily high-dimensional maps, in order to gain a powerful lever on the expressiveness of the model. In order to enhance its stability and generalization capabilities, we propose a normalized version of the feature maps. Our experiments show that this version leads to dramatic improvements over the unnormalized and/or two-dimensional maps, as well as to performance on non-sequential supervised learning tasks that compares favourably with popular models, including neural networks.


LIBTwinSVM: A Library for Twin Support Vector Machines

arXiv.org Machine Learning

Jalal A. Nasiri ‡ j.nasiri@irandoc.ac.ir † Faculty of Electrical and Computer Engineering, Islamic Azad University, North Tehran Branch, Tehran, Iran ‡ Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran Abstract This paper presents LIBTwinSVM, a free, efficient, and open source library for Twin Support Vector Machines (TSVMs). Our library provides a set of useful functionalities such as fast TSVMs estimators, model selection, visualization, a graphical user interface (GUI) application, and a Python application programming interface (API). The benchmarks results indicate the effectiveness of the LIBTwinSVM library for large-scale classification problems. Keywords: TwinSVM, classification, open source, GUI, API 1. Introduction Twin Support Vector Machine (TSVM) is an extension of the Support Vector Machine (SVM), which was proposed by Jayadeva et al. (2007). TSVM does binary classification using two nonparallel hyperplanes. Each of which is as close as possible to the samples of its own class and far from the samples of the other class. The two nonparallel hyperplanes are obtained by solving two smaller-sized Quadratic Programming Problems (QPPs).