Reviews: A Regularized Framework for Sparse and Structured Neural Attention

Neural Information Processing Systems 

Summary This paper presents a framework for implementing different sparse attention mechanisms by regularizing the max operator using convex functions. As a result, softmax and sparsemax are derived as special cases of this framework. Furthermore, two new sparse attention mechanisms are introduced that allow the model to learn to pay the same attention to contiguous spans. My concerns are regarding to the motivation of interpretability, as well as the baseline attention models. However, the paper is very well presented and the framework is a notable contribution that I believe will be useful for researchers working with attention mechanisms.