We provide improved convergence rates for various \emph{non-smooth} optimization problems via higher-order accelerated methods. In the case of $\ell_\infty$ regression, we achieves an $O(\epsilon^{-4/5})$ iteration complexity, breaking the $O(\epsilon^{-1})$ barrier so far present for previous methods. We arrive at a similar rate for the problem of $\ell_1$-SVM, going beyond what is attainable by first-order methods with prox-oracle access for non-smooth non-strongly convex problems. We further show how to achieve even faster rates by introducing higher-order regularization. Our results rely on recent advances in near-optimal accelerated methods for higher-order smooth convex optimization. In particular, we extend Nesterov's smoothing technique to show that the standard softmax approximation is not only smooth in the usual sense, but also \emph{higher-order} smooth. With this observation in hand, we provide the first example of higher-order acceleration techniques yielding faster rates for \emph{non-smooth} optimization, to the best of our knowledge.

Yesilli, Melih C., Khasawneh, Firas A., Otto, Andreas

Machining processes are most accurately described using complex dynamical systems that include nonlinearities, time delays and stochastic effects. Due to the nature of these models as well as the practical challenges which include time-varying parameters, the transition from numerical/analytical modeling of machining to the analysis of real cutting signals remains challenging. Some studies have focused on studying the time series of cutting processes using machine learning algorithms with the goal of identifying and predicting undesirable vibrations during machining referred to as chatter. These tools typically decompose the signal using Wavelet Packet Transforms (WPT) or Ensemble Empirical Mode Decomposition (EEMD). However, these methods require a significant overhead in identifying the feature vectors before a classifier can be trained. In this study, we present an alternative approach based on featurizing the time series of the cutting process using its topological features. We utilize support vector machine classifier combined with feature vectors derived from persistence diagrams, a tool from persistent homology, to encode distinguishing characteristics based on embedding the time series as a point cloud using Takens embedding. We present the results for several choices of the topological feature vectors, and we compare our results to the WPT and EEMD methods using experimental time series from a turning cutting test. Our results show that in most cases combining the TDA-based features with a simple Support Vector Machine (SVM) yields accuracies that either exceed or are within the error bounds of their WPT and EEMD counterparts.

The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels.

The main goal of statistical learning theory is to provide a fundamental framework for the problem of decision making and model construction based on sets of data. Here, we present a brief introduction to the fundamentals of statistical learning theory, in particular the difference between empirical and structural risk minimization, including one of its most prominent implementations, i.e. the Support Vector Machine.

We present a quantum machine learning algorithm for training Sparse Support Vector Machine, a linear classifier that minimizes the hinge loss and the $L_1$ norm of the feature weights vector. Sparse SVM results in a classifier that uses only a small fraction of the input features in making decisions, and is especially suitable for cases where the total number of features is at the same order, or larger, than the number of training samples. The algorithm utilizes recently proposed quantum solvers for semidefinite programming and linear programming problems. We show that while for an arbitrary binary classification problem no quantum speedup is achieved by using quantum SDP/LP solvers during training, there are realistic scenarios in which using a sparse linear classifier makes sense in terms of the expected accuracy of predictions, and polynomial quantum speedup compared to classical methods can be achieved.

If you've been at machine learning long enough, you know that there is a "no free lunch" principle -- there's no one-size-fits-all algorithm that will help you solve every problem and tackle every dataset. I work for Springboard -- we've put a lot of research into machine learning training and resources. At Springboard, we offer the first online course with a machine learning job guarantee. What helps a lot when confronted with a new problem is to have a primer for what algorithm might be the best fit for certain situations. Here, we talk about different problems and data types and discuss what might be the most effective algorithm to try for each one, along with a resource that can help you implement that particular model.

Schlag, Sebastian, Schmitt, Matthias, Schulz, Christian

The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of samples. Recently, multilevel approaches to train SVMs have been developed to allow for time efficient training on huge data sets. While regular SVMs perform the entire training in one - time consuming - optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments show that our new algorithm achieves speed-ups up to two orders of magnitude while having similar or better classification quality over state-of-the-art algorithms.

Machine learning – The ability for computers to improve functionality based on a variety of algorithms including pattern and text recognition. Over time, as it has more reference data, the machine learns to become more efficient. Natural-language processing – A process that deals with a computer's ability to analyze language through speech recognition, semantics and syntax. Just like a human learns a language through listening and reading while understanding the context, computers can attain a similar capability. Deep learning – A broader version of machine learning, deep learning is the ability for a computer to process various pieces of information the way a human would to make informed decisions and judgements.

van Laarhoven, Twan, Marchiori, Elena

Domain adaptation (DA) is the task of classifying an unlabeled dataset (target) using a labeled dataset (source) from a related domain. The majority of successful DA methods try to directly match the distributions of the source and target data by transforming the feature space. Despite their success, state of the art methods based on this approach are either involved or unable to directly scale to data with many features. This article shows that domain adaptation can be successfully performed by using a very simple randomized expectation maximization (EM) method. We consider two instances of the method, which involve logistic regression and support vector machine, respectively. The underlying assumption of the proposed method is the existence of a good single linear classifier for both source and target domain. The potential limitations of this assumption are alleviated by the flexibility of the method, which can directly incorporate deep features extracted from a pre-trained deep neural network. The resulting algorithm is strikingly easy to implement and apply. We test its performance on 36 real-life adaptation tasks over text and image data with diverse characteristics. The method achieves state-of-the-art results, competitive with those of involved end-to-end deep transfer-learning methods.

Júnior, Pedro Ribeiro Mendes, Boult, Terrance E., Wainer, Jacques, Rocha, Anderson

Often, when dealing with real-world recognition problems, we do not need, and often cannot have, knowledge of the entire set of possible classes that might appear during operational testing. Moreover, sometimes some of these classes may be ill-sampled, not sampled at all or undefined. In such cases, we need to think of robust classification methods able to deal with the "unknown" and properly reject samples belonging to classes never seen during training. Notwithstanding, almost all existing classifiers to date were mostly developed for the closed-set scenario, i.e., the classification setup in which it is assumed that all test samples belong to one of the classes with which the classifier was trained. In the open-set scenario, however, a test sample can belong to none of the known classes and the classifier must properly reject it by classifying it as unknown. In this work, we extend upon the well-known Support Vector Machines (SVM) classifier and introduce the Specialized Support Vector Machines (SSVM), which is suitable for recognition in open-set setups. SSVM balances the empirical risk and the risk of the unknown and ensures that the region of the feature space in which a test sample would be classified as known (one of the known classes) is always bounded, ensuring a finite risk of the unknown. The same cannot be guaranteed by the traditional SVM formulation, even when using the Radial Basis Function (RBF) kernel. In this work, we also highlight the properties of the SVM classifier related to the open-set scenario, and provide necessary and sufficient conditions for an RBF SVM to have bounded open-space risk. An extensive set of experiments compares the proposed method with existing solutions in the literature for open-set recognition and the reported results show its effectiveness.