Review for NeurIPS paper: Bayesian Attention Modules
–Neural Information Processing Systems
This paper proposes considering attention mechanisms as continuous latent variables, using VAEs for training. It uses reparametrizable distributions such as Weibull and log-normal distributions to get unnormalized weights, which are then normalized. Experiments show that the proposed continuous latent attention mechanism gets better performance compared to deterministic attention on a wide variety of tasks, including image captioning, machine translation, graph classification, and fine-tuning BERT. All reviewers recommended acceptance, pointing out that this is an interesting idea and a solid and well-executed work. One concern was raised about the significance of improvement on VQA and NMT, and about directly setting prior to approximate posterior, which the authors addressed in the rebuttal.
Neural Information Processing Systems
Feb-5-2025, 07:28:35 GMT
- Technology: