Goto

Collaborating Authors

 gumbel








3df80af53dce8435cf9ad6c3e7a403fd-Paper.pdf

Neural Information Processing Systems

The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators areeasy toimplement and lowvariance, butthegoal ofscaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalizetheGumbel-Softmax tricktocombinatorial spaces.




Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Neural Information Processing Systems

Many problems in machine learning reduce to learning a probability distribution (or policy) over sequences of discrete actions so as to maximize a downstream utility function. Examples include generating text sequences to maximize a task-specific metric like BLEU and generating action sequences in reinforcement learning (RL) to maximize expected return.