A Proofs of propositions

Neural Information Processing Systems 

In this section, we provide proofs of all results mentioned in the main paper. H ( X) is given by (Cover and Thomas, 2006, Theorem 8.4.1) Proposition A.1 shows that the conditional distribution of Theorem 2.4.1), and used that only one of the terms depends on GP, then the posterior belief about the reward ˆ r (s) |( q,y) is also a GP . The prior distribution of ˆ r is Gaussian, i.e., P (ˆ r| S) N (µ, Σ), with mean µ and covariance Σ . Finally, we can use standard results on conditioning Gaussian distributions (cf. Williams, 2006, Chapter A.2) to find that the conditional distribution is still Gaussian: P (ˆ r Then, the queries ask about the return, i.e., sum of rewards, of this sequence or states.