1102a326d5f7c9e04fc3c89d0ede88c9-Supplemental.pdf
–Neural Information Processing Systems
This is the distribution over datasets one obtains by first sampling a task t from Pt, and then sampling a dataset S from Pmz|t. Here p(S) corresponds to the marginal distribution over datasets S. Note that the last line above holds because E P f(,S) does not depend on t. Thus, in this section, we present a specialization of the bound for Gaussian distributions. Let P have mean µ and covariance; thus P = N(µ,) and analogously P,0 = N(µ0, 0). We can then apply the analytical form for the KL-divergence between two multivariate Gaussian distributions to the bound presented in Theorem 3. The result is the following bound holding under the same assumptions as Theorem 3: L(P,Pt) 1 l We implement the above bound in code instead of the non-specialized form of the KL divergence to speed up computations and simplify gradient computations. A.3.2 Few-Shot Learning Bound with Validation Data In this section, we will assume that, in addition to the training data S Pmz|t, we have access to validation data Sva Pnz|t at meta-training time. We will show that a meta-learning generalization bound can still be obtained in this case.
Neural Information Processing Systems
Apr-24-2026, 18:28:09 GMT