Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

This work addresses the question of how to improve the invariance properties of Convolutional Neural Networks. It introduces the so-called spatial transformer, a layer that performs an adaptive warping of incoming feature maps, thus generalizing the recent attention mechanisms for images. The resulting model requires no extra supervision and is trained back-to-back using backpropagation, leading to state-of-the-art results on several classification tasks. The paper is clearly written and its main contribution, the spatial transformer layer, is valuable for its novelty, simplicity and effectiveness. The related work section covers most relevant literature, except perhaps recent works that combine deformable parts models with CNNs (see for example "Deformable Part Models are Convolutional Neural Networks", "End-to-End Integration of a Convolution Network, Deformable Parts Model and Non-Maximum Suppression" both at cvpr 2015), since they also incorporate an inference over deformation or registration parameters, as in the spatial transformer case.