Review for NeurIPS paper: Bayesian Attention Modules

Feb-5-2025, 07:28:35 GMT–Neural Information Processing Systems

This paper proposes considering attention mechanisms as continuous latent variables, using VAEs for training. It uses reparametrizable distributions such as Weibull and log-normal distributions to get unnormalized weights, which are then normalized. Experiments show that the proposed continuous latent attention mechanism gets better performance compared to deterministic attention on a wide variety of tasks, including image captioning, machine translation, graph classification, and fine-tuning BERT. All reviewers recommended acceptance, pointing out that this is an interesting idea and a solid and well-executed work. One concern was raised about the significance of improvement on VQA and NMT, and about directly setting prior to approximate posterior, which the authors addressed in the rebuttal.

bayesian attention module, latent variable model, neurips paper, (2 more...)

Neural Information Processing Systems

Feb-5-2025, 07:28:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.93)