Sparse and Continuous Attention Mechanisms André F. T. Martins,, Ant os Treviso Vlad Niculae, Pe
–Neural Information Processing Systems
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g.
Neural Information Processing Systems
Mar-21-2025, 18:57:15 GMT