On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions
Indelman, Hedda Cohen, Hazan, Tamir
The Gumbel-Softmax probability distribution allows learning discrete tokens in generative learning, while the Gumbel-Argmax probability distribution is useful in learning discrete structures in discriminative learning. Despite the efforts invested in optimizing these probability models, their statistical properties are under-explored. In this work, we investigate their representation properties and determine for which families of parameters these probability distributions are complete, i.e., can represent any probability distribution, and minimal, i.e., can represent a probability distribution uniquely. We rely on convexity and differentiability to determine these statistical conditions and extend this framework to general probability models, such as Gaussian-Softmax and Gaussian-Argmax. We experimentally validate the qualities of these extensions, which enjoy a faster convergence rate. We conclude the analysis by identifying two sets of parameters that satisfy these assumptions and thus admit a complete and minimal representation. Our contribution is theoretical with supporting practical evaluation.
Jun-4-2024
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- Europe > France
- Hauts-de-France > Nord > Lille (0.04)
- Asia
- Middle East > Jordan (0.04)
- China (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Technology: