Goto

Collaborating Authors

 Support Vector Machines


Analysis of a Random Forests Model

arXiv.org Machine Learning

In a series of papers and technical reports, Breiman [9, 10, 11, 12] demonstrated that substantial gains in classification and regression accuracy can be achieved by using ensembles of trees, where each tree in the ensemble is grown in accordance with a random parameter. Final predictions are obtained by aggregating over the ensemble. As the base constituents of the ensemble are tree-structured predictors, and since each of these trees is constructed using an injection of randomness, these procedures are called "random forests". Breiman's ideas were decisively influenced by the early work of Amit and Geman [3] on geometric feature selection, the random subspace method of Ho [27] and the random split selection approach of Dietterich [21]. As highlighted by various empirical studies (see [11, 36, 20, 24, 25] for instance), random forests have emerged as serious competitors to state-of-the-art methods such as boosting (Freund [22]) and support vector machines (Shawe-Taylor and Cristianini [35]).


Random Feature Maps for Dot Product Kernels

arXiv.org Machine Learning

Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms. We extend this line of work and present low distortion embeddings for dot product kernels into linear Euclidean spaces. We base our results on a classical result in harmonic analysis characterizing all dot product kernels and use it to define randomized feature maps into explicit low dimensional Euclidean spaces in which the native dot product provides an approximation to the dot product kernel with high confidence.


Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods

arXiv.org Machine Learning

Regularized kernel methods such as, e.g., support vector machines and least-squares support vector regression constitute an important class of standard learning algorithms in machine learning. Theoretical investigations concerning asymptotic properties have manly focused on rates of convergence during the last years but there are only very few and limited (asymptotic) results on statistical inference so far. As this is a serious limitation for their use in mathematical statistics, the goal of the article is to fill this gap. Based on asymptotic normality of many of these methods, the article derives a strongly consistent estimator for the unknown covariance matrix of the limiting normal distribution. In this way, we obtain asymptotically correct confidence sets for $\psi(f_{P,\lambda_0})$ where $f_{P,\lambda_0}$ denotes the minimizer of the regularized risk in the reproducing kernel Hilbert space $H$ and $\psi:H\rightarrow\mathds{R}^m$ is any Hadamard-differentiable functional. Applications include (multivariate) pointwise confidence sets for values of $f_{P,\lambda_0}$ and confidence sets for gradients, integrals, and norms.


Automatic Tuning of Interactive Perception Applications

arXiv.org Machine Learning

Interactive applications incorporating high-data rate sensing and computer vision are becoming possible due to novel runtime systems and the use of parallel computation resources. To allow interactive use, such applications require careful tuning of multiple application parameters to meet required fidelity and latency bounds. This is a nontrivial task, often requiring expert knowledge, which becomes intractable as resources and application load characteristics change. This paper describes a method for automatic performance tuning that learns application characteristics and effects of tunable parameters online, and constructs models that are used to maximize fidelity for a given latency constraint. The paper shows that accurate latency models can be learned online, knowledge of application structure can be used to reduce the complexity of the learning task, and operating points can be found that achieve 90% of the optimal fidelity by exploring the parameter space only 3% of the time.


Differential Privacy for Functions and Functional Data

arXiv.org Machine Learning

Differential privacy is a framework for privately releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the same RKHS as the Gaussian process, then the correct noise level is established by measuring the "sensitivity" of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in reproducing kernel Hilbert spaces.


Modeling Spread of Disease from Social Interactions

AAAI Conferences

Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we study the roles that social ties and interactions between specific individuals play in the progress of a contagion. We focus on public Twitter data, where we find that for every health-related message there are more than 1,000 unrelated ones. This class imbalance makes classification particularly challenging. Nonetheless, we present a framework that accurately identifies sick individuals from the content of online communication. Evaluation on a sample of 2.5 million geo-tagged Twitter messages shows that social ties to infected, symptomatic people, as well as the intensity of recent co-location, sharply increase one's likelihood of contracting the illness in the near future. To our knowledge, this work is the first to model the interplay of social activity, human mobility, and the spread of infectious disease in a large real-world population. Furthermore, we provide the first quantifiable estimates of the characteristics of disease transmission on a large scale without active user participation---a step towards our ability to model and predict the emergence of global epidemics from day-to-day interpersonal interactions.


MAV Stabilization using Machine Learning and Onboard Sensors

arXiv.org Artificial Intelligence

Past automation work with miniature aerial vehicles (MAVs) at Cornell has produced interesting results [1] and presented additional challenges. During past projects, results have often been limited not by insufficiencies in planning algorithms, but by navigation errors stemming from inadequate control in the face of realistic, breezy operating environments. In many cases the MAVs will simply drift off the desired path (Figure 1). Thus, this project focuses on refining the basic motion of the same platform, and in particular, minimizing its drift. Our work focuses on reduction of low frequency drift in gps-denied environments. Similar work has been done, some using neural networks [4] or using adaptive-fuzzy control methods [5] to stabilize a quadrotor. Though this research has produced promising results, these methods were demonstrated only in simulation, not via live testing. 1 Figure 1: Desired path vs. actual path due to drift.


Smoothing Multivariate Performance Measures

arXiv.org Machine Learning

A Support Vector Method for multivariate performance measures was recently introduced by Joachims (2005). The underlying optimization problem is currently solved using cutting plane methods such as SVM-Perf and BMRM. One can show that these algorithms converge to an eta accurate solution in O(1/Lambda*e) iterations, where lambda is the trade-off parameter between the regularizer and the loss function. We present a smoothing strategy for multivariate performance scores, in particular precision/recall break-even point and ROCArea. When combined with Nesterov's accelerated gradient algorithm our smoothing strategy yields an optimization algorithm which converges to an eta accurate solution in O(min{1/e,1/sqrt(lambda*e)}) iterations. Furthermore, the cost per iteration of our scheme is the same as that of SVM-Perf and BMRM. Empirical evaluation on a number of publicly available datasets shows that our method converges significantly faster than cutting plane methods without sacrificing generalization ability.


Hierarchical Maximum Margin Learning for Multi-Class Classification

arXiv.org Machine Learning

Due to myriads of classes, designing accurate and efficient classifiers becomes very challenging for multi-class classification. Recent research has shown that class structure learning can greatly facilitate multi-class learning. In this paper, we propose a novel method to learn the class structure for multi-class classification problems. The class structure is assumed to be a binary hierarchical tree. To learn such a tree, we propose a maximum separating margin method to determine the child nodes of any internal node. The proposed method ensures that two classgroups represented by any two sibling nodes are most separable. In the experiments, we evaluate the accuracy and efficiency of the proposed method over other multi-class classification methods on real world large-scale problems. The results show that the proposed method outperforms benchmark methods in terms of accuracy for most datasets and performs comparably with other class structure learning methods in terms of efficiency for all datasets.


New Probabilistic Bounds on Eigenvalues and Eigenvectors of Random Kernel Matrices

arXiv.org Machine Learning

Kernel methods are successful approaches for different machine learning problems. This success is mainly rooted in using feature maps and kernel matrices. Some methods rely on the eigenvalues/eigenvectors of the kernel matrix, while for other methods the spectral information can be used to estimate the excess risk. An important question remains on how close the sample eigenvalues/eigenvectors are to the population values. In this paper, we improve earlier results on concentration bounds for eigenvalues of general kernel matrices. Meanwhile, the obstacles for sharper bounds are accounted for and partially addressed. As a case study, we derive a concentration inequality for sample kernel target-alignment. 1 INTRODUCTION Kernel methods such as Spectral Clustering, Kernel Principal Component Analysis(KPCA), and Support Vector Machines, are successful approaches in many practical machine learning and data analysis problems (Steinwart & Christmann, 2008). The main ingredient of these methods is the kernel matrix, which is built using the kernel function, evaluated at given sample points.