Reviews: Saliency-based Sequential Image Attention with Multiset Prediction

Neural Information Processing Systems 

In this paper, the authors proposed a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism based on 2D Gaussian model. Furthermore this mechanism sequentially focuses on salient regions and takes additional glimpses within those regions in multi-label image classification. This sequential attention model also supports multiset prediction, where a reinforcement learning based training procedure allows classification to be done on instances with arbitrary label permutation and multiple instances per label. Pros: 1) This paper proposes a novel saliency based attention mechanism that utilizes saliency in the top layer (meta-controller) with a new 2D Gaussian based attention map. This new attention map models the regional /positional 2D information with a mixture of Gaussian distributions, which is more general than the standard attention layer (in DRAW, Show-attend-tell), where attention is enforced based on softmax activation. This mechanism is intuitive as it's inspired by human-level attention mechanism.