2e6d9c6052e99fcdfa61d9b9da273ca2-Supplemental.pdf
–Neural Information Processing Systems
As a "warm-up" and because it is of independent interest, we will first study an adaptation algorithm which picks the single best kernel from the meta tasks: Definition 7 (Adaptation by choosing-one-best kernel). With the set of base kernels {k1,...,kN}, ˆk = arg maxi ˆJλne(StrP,StrQ; ki) is said to be the best kernel adaptation. Proposition 3 shows uniform convergence of ˆJλ for direct adaptation of a kernel class, whether a deep kernel or multiple kernel learning. For our analysis of choosing the best single kernel, however, we only need uniform convergence over a finite set, where we can obtain a slightly better rate. Let ki be a set of base kernels, whose power criteria on the corresponding distributions are Ji = J(P,Q; ki), and let s0 = mini [N] σ2H1(P,Q; ki).
Neural Information Processing Systems
Apr-25-2026, 07:55:47 GMT