Appendix - Hard-Attention for Scalable Image Classification