Review for NeurIPS paper: Bayesian Attention Modules

Feb-5-2025, 07:28:42 GMT–Neural Information Processing Systems

The improvement compared to deterministic attention seems marginal on some tasks such as VQA and machine translation. Unlike prior works on discrete latent variables that can make interpretability claims, I'm not sure why we want to model soft attention as a latent variable. Are they better under low resource scenarios? Also, can you do more quantification of the benefits of modeling attention uncertainties? Or even qualitatively showing a few examples of samples from the attention distribution and see if they truly reflect the underlying uncertainties.

approximate posterior, bayesian attention module, neurips paper, (4 more...)

Neural Information Processing Systems

Feb-5-2025, 07:28:42 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.39)