A Sample-dependent Baselines in REBAR and RELAX We start with the REINFORCE estimator with the sample-dependent baseline b k: 1 K

Neural Information Processing Systems 

H controlled by the parameter . To form modified RELAX in Section 6.3, we replace The results are shown in Figure 5 . In fact, for this V AE architecture, the per-iteration time of RODEO is 25.2ms, which is very close to the 23.1ms of RLOO. We do not observe significant difference between the two versions of RODEO. Throughout, we call this the "test log-likelihood bound."

Similar Docs  Excel Report  more

TitleSimilaritySource
None found