A Randomized Algorithm for Large Scale Support Vector Learning
Kumar, Krishnan, Bhattacharya, Chiru, Hariharan, Ramesh
–Neural Information Processing Systems
This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separableclassification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosenO(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated fromrandomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver.
Neural Information Processing Systems
Dec-31-2008