Hard-Attention for Scalable Image Classification

Neural Information Processing Systems 

We compare our model against hard-attention baselines on ImageNet, achieving higher accuracy with less resources (FLOPs, processing time and memory).