pseudo landmark point
Export Reviews, Discussions, Author Feedback and Meta-Reviews
For [23] we refer to the paper pointed by you. 5. To Reviewer_25: the overall conceptual motivation for the paper is somewhat weak... Nystrom approximation can be used to approximate the kernel matrix and speed up kernel machines, and from Table 1 we can see that the performance is suboptimal even when rank=200 (see the 5-th column). In this case, it requires 200 inner product computations to make one prediction, which is too slow for many real-time systems (e.g., web applications, robotic applications ...). Therefore state-of-the-art Nystrom method is not good enough, and we reduce the prediction time to 10~20 inner products with a better classification accuracy, which is a big improvement. Also, as we mentioned in the point 1 above, although we want to optimize the prediction time, our method still has fast training time. We agree that the psuedo landmark point technique can be potentially applied to speed up the training time, and it is an interesting research direction.
Fast Prediction for Large-Scale Kernel Machines
Hsieh, Cho-Jui, Si, Si, Dhillon, Inderjit S.
Kernel machines such as kernel SVM and kernel ridge regression usually construct high quality models; however, their use in real-world applications remains limited due to the high prediction cost. In this paper, we present two novel insights for improving the prediction efficiency of kernel machines. First, we show that by adding “pseudo landmark points” to the classical Nystr¨om kernel approximation in an elegant way, we can significantly reduce the prediction error without much additional prediction cost. Second, we provide a new theoretical analysis on bounding the error of the solution computed by using Nystr¨om kernel approximation method, and show that the error is related to the weighted kmeans objective function where the weights are given by the model computed from the original kernel. This theoretical insight suggests a new landmark point selection technique for the situation where we have knowledge of the original model. Based on these two insights, we provide a divide-and-conquer framework for improving the prediction speed. First, we divide the whole problem into smaller local subproblems to reduce the problem size. In the second phase, we develop a kernel approximation based fast prediction approach within each subproblem. We apply our algorithm to real world large-scale classification and regression datasets, and show that the proposed algorithm is consistently and significantly better than other competitors. For example, on the Covertype classification problem, in terms of prediction time, our algorithm achieves more than 10000 times speedup over the full kernel SVM, and a two-fold speedup over the state-of-the-art LDKL approach, while obtaining much higher prediction accuracy than LDKL (95.2% vs. 89.53%).