efficient second-order online kernel learning
Efficient Second-Order Online Kernel Learning with Adaptive Embedding
Online kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate $\O(\sqrt{T})$ more loss than the optimal function, but the curse of kernelization results in a $\O(t)$ per step complexity. Second-order methods get closer to the optimum much faster, suffering only $\O(\log(T))$ regret, but second-order updates are even more expensive, with a $\O(t^2)$ per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding.
Reviews: Efficient Second-Order Online Kernel Learning with Adaptive Embedding
The paper proposes an efficient second-order online kernel learning mainly by combining KONS and Nystrom method. NOVELTY The novelty is limited on both the methodological and theoretical contributions. The achieved results do not have profound implication for the advancement of theory and practice. WRITING QUALITY The English writing and organization of this paper are relatively good. The reviewer strongly suggests the authors arrange Table 2 in the main paper rather than in Appendix because the experimental results in Table 2 are the core material.
Efficient Second-Order Online Kernel Learning with Adaptive Embedding
Calandriello, Daniele, Lazaric, Alessandro, Valko, Michal
Online kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate $\O(\sqrt{T})$ more loss than the optimal function, but the curse of kernelization results in a $\O(t)$ per step complexity. Second-order methods get closer to the optimum much faster, suffering only $\O(\log(T))$ regret, but second-order updates are even more expensive, with a $\O(t 2)$ per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding.