Appendix

Neural Information Processing Systems 

A.1 Proof of Proposition 1 and 3 To prove Proposition 1, we first need the following lemma: Lemma 1 (Alternative equivalent definition of functional KL divergence [47]). Readers may refer to [47] for the proof of this lemma. Proposition 1. Suppose c has full support on T At q(f |D), both divergence achieves minimum value, 0. Therefore, D Proposition 3. Let n, X A.2 Proof of Proposition 2 Proposition 2. Let p(f) and q(f) be two distributions for random functions. U(T), 1 k n That is, c first samples a positive integer n from the distribution p(n), and then draw n samples from T independently and uniformly. We will now discuss these two cases separately. The first inequality is due to information processing inequality. For example,p(n) could be a geometric distribution with mean parameter greater than 1 (or success probability that is strictly greater than 0, and strictly smaller than 1).. Since geometric distribution has full support in Z Similar to the first case, let p(n) be a geometric distribution with mean parameter greater than 1 (or success probability that is strictly greater than 0, and strictly smaller than 1).