gumbel
Country:
- North America > United States (0.29)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
Technology:
Country:
- North America > Canada > Ontario > Toronto (0.05)
- Asia > Middle East > Israel > Haifa District > Haifa (0.05)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (2 more...)
Country:
- North America > United States > Maryland (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Spain > Canary Islands (0.04)
- Asia > Middle East > Israel (0.04)
Technology:
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Technology:
Technology:
3df80af53dce8435cf9ad6c3e7a403fd-Paper.pdf
The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators areeasy toimplement and lowvariance, butthegoal ofscaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalizetheGumbel-Softmax tricktocombinatorial spaces.
Country:
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Technology:
Country:
- North America > United States (0.29)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
Technology:
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Many problems in machine learning reduce to learning a probability distribution (or policy) over sequences of discrete actions so as to maximize a downstream utility function. Examples include generating text sequences to maximize a task-specific metric like BLEU and generating action sequences in reinforcement learning (RL) to maximize expected return.
Country:
- North America > United States > Maryland (0.04)
- North America > Canada (0.04)
- Europe > Spain > Canary Islands (0.04)
- Asia > Middle East > Israel (0.04)
Technology: