Improved Gradient-Based Optimization Over Discrete Distributions

Andriyash, Evgeny, Vahdat, Arash, Macready, Bill

arXiv.org Machine Learning 

In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks. Discrete stochastic variables arise naturally for certain types of data, and distributions over discrete variables can be important components of probabilistic models. Eq. (1) is commonly minimized by gradient-based methods, which require estimating the gradient The two main approaches to this problem are score function estimators and pathwise derivative estimators(see Schulman et al. (2015) for an overview). In this paper we focus on pathwise derivative estimators. This approach is applicable to cases in which the stochastic variables can be reparameterized as a function of other parameter-independent random variables, i.e. z g However, for discrete variables the cumulative distribution function (CDF) is discontinuous and reparameterization is not possible.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found