Reviews: Kernelized Bayesian Softmax for Text Generation

Jan-25-2025, 20:43:04 GMT–Neural Information Processing Systems

This paper builds on the motivation that context vectors from a language model, such as BERT, often cluster into separate groups for the same next word. These clusters may correspond to different senses of the word, and often have varying variances. The authors argue that a traditional softmax is not expressive enough to capture these clusters. A similar argument was made by Yang et al in their Mixture of Softmax (MoS) paper. The solution presented here is quite different though -- to allocate multiple senses to each word in the output embedding table, and to use a parameterized kernel to model the variance. The ideas are pretty neat, and as far as i know, original.

kernelized bayesian softmax, make sense, text generation, (2 more...)

Neural Information Processing Systems

Jan-25-2025, 20:43:04 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.38)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.88)
  - Machine Learning > Statistical Learning (0.85)