Support Vector Machines
Random depthwise signed convolutional neural networks
Random weights in convolutional neural networks have shown promising results in previous studies yet remain below par compared to trained networks on image benchmarks. We explore depthwise convolutional neural networks with thousands of random filters in each layer, the sign activation function in between layers, and training performed only at the last layer with a linear support vector machine. We show that our network attains higher accuracies than previous random networks and is comparable to trained large networks on large images from the STL10 and ImageNet benchmarks. Since our network lacks a gradient due to the sign activation it is not possible to produce gradient-based adversarial examples targeting it. We show that our network is also less affected by gradient based adversarial examples produced from state of the art networks that considerably hamper their performance. As a possible explanation for our network's accuracy with random weights we show that the the margin of the linear support vector machine is larger on our final representation compared to the original dataset and increases with the number of random filters. Our network is simple and fast to train and predict, attains high classification accuracy particularly on large images, is hard to attack with adversarial examples, and is less affected by gradient based adversarial examples compared to state of the art networks.
Benchmarks for Image Classification and Other High-dimensional Pattern Recognition Problems
Yellamraju, Tarun, Hepp, Jonas, Boutin, Mireille
A good classification method should yield more accurate results than simple heuristics. But there are classification problems, especially high-dimensional ones like the ones based on image/video data, for which simple heuristics can work quite accurately; the structure of the data in such problems is easy to uncover without any sophisticated or computationally expensive method. On the other hand, some problems have a structure that can only be found with sophisticated pattern recognition methods. We are interested in quantifying the difficulty of a given high-dimensional pattern recognition problem. We consider the case where the patterns come from two pre-determined classes and where the objects are represented by points in a high-dimensional vector space. However, the framework we propose is extendable to an arbitrarily large number of classes. We propose classification benchmarks based on simple random projection heuristics. Our benchmarks are 2D curves parameterized by the classification error and computational cost of these simple heuristics. Each curve divides the plane into a "positive- gain" and a "negative-gain" region. The latter contains methods that are ill-suited for the given classification problem. The former is divided into two by the curve asymptote; methods that lie in the small region under the curve but right of the asymptote merely provide a computational gain but no structural advantage over the random heuristics. We prove that the curve asymptotes are optimal (i.e. at Bayes error) in some cases, and thus no sophisticated method can provide a structural advantage over the random heuristics. Such classification problems, an example of which we present in our numerical experiments, provide poor ground for testing new pattern classification methods.
Support Vector Machine Application for Multiphase Flow Pattern Prediction
Guillen-Rondon, Pablo, Robinson, Melvin D., Torres, Carlos, Pereya, Eduardo
In this paper a data analytical approach featuring support vector machines (SVM) is employed to train a predictive model over an experimentaldataset, which consists of the most relevant studies for two-phase flow pattern prediction. The database for this study consists of flow patterns or flow regimes in gas-liquid two-phase flow. The term flow pattern refers to the geometrical configuration of the gas and liquid phases in the pipe. When gas and liquid flow simultaneously in a pipe, the two phases can distribute themselves in a variety of flow configurations. Gas-liquid two-phase flow occurs ubiquitously in various major industrial fields: petroleum, chemical, nuclear, and geothermal industries. The flow configurations differ from each other in the spatial distribution of the interface, resulting in different flow characteristics. Experimental results obtained by applying the presented methodology to different combinations of flow patterns demonstrate that the proposed approach is state-of-the-art alternatives by achieving 97% correct classification. The results suggest machine learning could be used as an effective tool for automatic detection and classification of gas-liquid flow patterns.
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition
Karimi, Hamed, Nutini, Julie, Schmidt, Mark
In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence of these methods. Along the way, we give simple convergence results for a wide variety of problems in machine learning: least squares, logistic regression, boosting, resilient backpropagation, L1-regularization, support vector machines, stochastic dual coordinate ascent, and stochastic variance-reduced gradient methods.
Support Vector Machine -- Introduction to Machine Learning Algorithms
To separate the two classes of data points, there are many possible hyperplanes that could be chosen. Our objective is to find a plane that has the maximum margin, i.e the maximum distance between data points of both classes. Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes.
Machine learning detects lymphedema in breast cancer survivors
A new study led by NYU Rory Meyers College of Nursing shows that machine learning--combined with the collection of real-time symptom reports using a mHealth system--can provide early detection and help patients to receive timely intervention to effectively manage lymphedema. Lymphedema, which has no cure and comes with lifelong risk, is the build-up of lymph fluid that causes swelling in the arms or legs of patients. In the study of 355 women from 45 states who had undergone treatment for breast cancer, the performance of five machine learning algorithms were evaluated--artificial neural network (ANN), Decision Tree of C4.5, Decision Tree of C5.0, gradient boosting model and support vector machine. According to results published in the journal mHealth, all five machine learning approaches outperformed the conventional statistical approach. However, of the five, the ANN achieved the best performance for detecting lymphedema with accuracy of 93.75 percent, sensitivity of 95.65 percent and specificity of 91.03 percent.
Machine Learning CICY Threefolds
Bull, Kieran, He, Yang-Hui, Jejjala, Vishnu, Mishra, Challenger
The latest techniques from Neural Networks and Support Vector Machines (SVM) are used to investigate geometric properties of Complete Intersection Calabi-Yau (CICY) threefolds, a class of manifolds that facilitate string model building. An advanced neural network classifier and SVM are employed to (1) learn Hodge numbers and report a remarkable improvement over previous efforts, (2) query for favourability, and (3) predict discrete symmetries, a highly imbalanced problem to which the Synthetic Minority Oversampling Technique (SMOTE) is applied to boost performance. In each case study, we employ a genetic algorithm to optimise the hyperparameters of the neural network. We demonstrate that our approach provides quick diagnostic tools capable of shortlisting quasi-realistic string models based on compactification over smooth CICYs and further supports the paradigm that classes of problems in algebraic geometry can be machine learned.
Anonymous Walk Embeddings
Ivanov, Sergey, Burnaev, Evgeny
The task of representing entire graphs has seen a surge of prominent results, mainly due to learning convolutional neural networks (CNNs) on graph-structured data. While CNNs demonstrate state-of-the-art performance in graph classification task, such methods are supervised and therefore steer away from the original problem of network representation in task-agnostic manner. Here, we coherently propose an approach for embedding entire graphs and show that our feature representations with SVM classifier increase classification accuracy of CNN algorithms and traditional graph kernels. For this we describe a recently discovered graph object, anonymous walk, on which we design task-independent algorithms for learning graph representations in explicit and distributed way. Overall, our work represents a new scalable unsupervised learning of state-of-the-art representations of entire graphs.
Scalable Multi-Class Bayesian Support Vector Machines for Structured and Unstructured Data
Wistuba, Martin, Rawat, Ambrish
We introduce a new Bayesian multi-class support vector machine by formulating a pseudo-likelihood for a multi-class hinge loss in the form of a location-scale mixture of Gaussians. We derive a variational-inference-based training objective for gradient-based learning. Additionally, we employ an inducing point approximation which scales inference to large data sets. Furthermore, we develop hybrid Bayesian neural networks that combine standard deep learning components with the proposed model to enable learning for unstructured data. We provide empirical evidence that our model outperforms the competitor methods with respect to both training time and accuracy in classification experiments on 68 structured and two unstructured data sets. Finally, we highlight the key capability of our model in yielding prediction uncertainty for classification by demonstrating its effectiveness in the tasks of large-scale active learning and detection of adversarial images.
Adversarial Auto-encoders for Speech Based Emotion Recognition
Sahu, Saurabh, Gupta, Rahul, Sivaraman, Ganesh, AbdAlmageed, Wael, Espy-Wilson, Carol
Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition. They map the autoencoder's bottleneck layer output (termed as code vectors) to different noise Probability Distribution Functions (PDFs), that can be further regularized to cluster based on class information. In addition, they also allow a generation of synthetic samples by sampling the code vectors from the mapped PDFs. Inspired by these properties, we investigate the application of adversarial autoencoders to the domain of emotion recognition. Specifically, we conduct experiments on the following two aspects: (i) their ability to encode high dimensional feature vector representations for emotional utterances into a compressed space (with a minimal loss of emotion class discriminability in the compressed space), and (ii) their ability to regenerate synthetic samples in the original feature space, to be later used for purposes such as training emotion recognition classifiers. We demonstrate the promise of adversarial autoencoders with regards to these aspects on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and present our analysis.