A Proofs

Neural Information Processing Systems 

In this section, we provide proofs of the main theorems presented in the paper. "Task specific generalization bound", that bounds the generalization error averaged over all "Task environment generalization bound", that bounds the transfer error from the observed Subsequently, combining Eq.(15) with Eq.(16), it is straightforward to get Eq.(4), with C ( δ,λ,β,n,m For the "task environment generalization bound", define the "meta-training" generalization error of a The "task environment generalization bound" is the same as the one in Theorem 2, because The inequality uses Jensen's inequality to move the logarithm Therefore, we can rewrite Eq.(24) in the form of the implicit gradient, d(PacB) d p The second term of Eq.(25) is equivalent to, null Q The Monte-Carlo gradient estimator of this has the same high-variance problem as in the policy gradient method, which causes unreliable inference without warm-start. The Pseudocode of P ACMAML is shown in Algorithm 1.Algorithm 1 Each iteration in the P ACOH and P ACMAML setting takes about 0.03-0.06s P ACOH and P ACMAML obtained from the same set of experiments for Figure 1-4. In Figure 1, we show the comparison between the total bound of P ACOH and P ACMAML.