Goto

Collaborating Authors

 Support Vector Machines


Learning Monotonic Transformations for Classification

Neural Information Processing Systems

A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated.


Bundle Methods for Machine Learning

Neural Information Processing Systems

We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach.


Random Features for Large-Scale Kernel Machines

Neural Information Processing Systems

To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shiftinvariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large-scale kernel machines.


Learning Monotonic Transformations for Classification

Neural Information Processing Systems

A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classifier. In many domains such as document classification, image histogram classification and gene microarray experiments, fixed monotonic transformations can be useful as a preprocessing step. However, most classifiers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classifier without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classifier. Two algorithmic implementations of the method are formalized. The first solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semidefinite relaxation that overcomes initialization issues in the greedy optimization problem. The effectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated.


Configuration Estimates Improve Pedestrian Finding

Neural Information Processing Systems

Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder.


Multi-Task Learning via Conic Programming

Neural Information Processing Systems

When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as \emph{uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein super-family classification and ordinal regression problems.


Efficient multiple hyperparameter learning for log-linear models

Neural Information Processing Systems

Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for log-linear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter.


A Randomized Algorithm for Large Scale Support Vector Learning

Neural Information Processing Systems

This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separableclassification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosenO(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated fromrandomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver.


Efficient Convex Relaxation for Transductive Support Vector Machine

Neural Information Processing Systems

We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM.


Bundle Methods for Machine Learning

Neural Information Processing Systems

We present a globally convergent method for regularized risk minimization problems. Ourmethod applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems.We demonstrate in experiments the performance of our approach.