Reviews: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
–Neural Information Processing Systems
Summary This paper proposes a control variate (CV) for the discrete distribution's REINFORCE gradient estimator (RGE). The CV is based on the Concrete distribution (CD), a continuous relaxation of the discrete distribution that admits only biased Monte Carlo (MC) estimates of the discrete distribution's gradient. Yet, using the CD as a CV results in an *unbiased* estimator for a discrete random variable's (rv) path gradient as well as lower variance than the RGE (as expected). REBAR is derived by exploiting the REINFORCE estimator for the CD and by observing that given a discrete draw, the CD's continuous parameter (z, here) can be marginalized out. REBAR has some nice connections to other estimators for discrete rv gradients, including MuProp.
Neural Information Processing Systems
Oct-8-2024, 12:29:59 GMT
- Technology: