Baselines
–Neural Information Processing Systems
As shown in the main text, under the assumption that the influence network is unbiased, our factor baselines are indeed valid control variates. We prove this result below, repeating the statement itself for posterity and providing a supplementary lemma on control variates as a restatement of known results. Let X, Y and Zbe random variables where the law of Xconditional on Z is denoted Pθ(X|Z), and Y is independent of X conditioned on Z; i.e. Then, we have that E[Y θln Pθ(X)] = 0. Proof. Factor baselines are valid control variates if GΣ is true to the MDP (i.e.
Neural Information Processing Systems
Apr-25-2026, 06:34:21 GMT
- Technology: