Saliency-based Sequential Image Attention with Multiset Prediction