0b9e57c46de934cee33b0e8d1839bfc2-Supplemental.pdf

Neural Information Processing Systems 

We use Law(X) to denote the distribution of random variable X. When ν is a probability distribution for over set Ω and Ais a subset of Ω, we use ν(A) to denote the probability that the random variable X belongs to A, when X is sampled from distribution ν. Similarly, the marginal distribution on the next N random variables for fγ,r#ν is fγ,r#ν2. We thus proved equation (22) and Lemma 3 is proved. Lemma 4 Suppose Z1( |A) and Z2( |A) are two conditional distribution with range RN, and for all values of a Ω, Wp(Law(Z1(a)),Law(Z2(a))) c (23) where Z1(a) Z1( |A= a), and Z2(a) Z2( |A= a).