Goto

Collaborating Authors

 point efficiently


Kernel Methods Through the Roof: Handling Billions of Points Efficiently

Neural Information Processing Systems

Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance. Additionally, we make our software available as an easy to use library.


Review for NeurIPS paper: Kernel Methods Through the Roof: Handling Billions of Points Efficiently

Neural Information Processing Systems

Weaknesses: In my opinion, comparing Nystrom for kernel ridge regression to variational GPs is apples to oranges in a lot of ways that are frankly unfair to variational GPs. In my view, a much more appropriate comparison would be a KeOps based implementation of SGPR or FITC with fixed inducing points. Variational GPs introduce a very large number of parameters in the form of the variational distribution and inducing point locations that require optimization and significantly increase the total amount of time spent in optimization. Methods that train GPs through the marginal likelihood with fixed inducing locations (e.g., as in Nystrom) may have as few as 3 parameters to fit. By contrast, SVGP learns (1) a variational distribution q(u) including a variational covariance matrix, and (2) the inducing point locations.


Review for NeurIPS paper: Kernel Methods Through the Roof: Handling Billions of Points Efficiently

Neural Information Processing Systems

There is a consensus among the knowledgeable reviewers that this work makes a significant contribution to the kernel community. It integrates several practical techniques and engineering efforts to further improve the scalability of the kernel machines. The techniques proposed in this work will permit the use of several GPUs in training kernel-based models with huge amount of data, which I also see as a significant contribution. Regardless of the overall score, I think this paper deserves an oral because it shows how to take full advantage of GPU hardware when solving learning problems with kernels methods. Scalability is one of the long-standing problems in kernel machines but has been largely neglected and under-appreciated in the past few years.


Kernel Methods Through the Roof: Handling Billions of Points Efficiently

Neural Information Processing Systems

Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications.