Derivations

Neural Information Processing Systems 

Lemma 1 (Ensemble Sample Diversity Decomposition) Given the state-action visit distribution of the ensemble policy ρ. The entropy of this distribution is H(ρ). By definition, I(ρ;z) = H(ρ) H(ρ|z) = H(z) H(z|ρ) (4) By randomly selecting the latent variable z, we consider that H(z) is a constant depending on the number of z. Lemma 3 Let X1,X2,...,XN be an infinite sequence of i.i.d. The PDF of XN:N can be derived by taking the derivative of PDF.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found