Review for NeurIPS paper: Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Neural Information Processing Systems 

Weaknesses: Post rebuttal edit: The extra experiments comparing to random did convince me that GFNet does something beyond random. But I'm still not convinced that GFNets are particularly smart at glancing. Note (from table 1 of the rebuttal) for instance that to reach sota accuracy, it looks like a fovea/glance of size 1/n of the original window seems to need n steps. To me this seems that glancing barely pays for itself. In fact, if you replaced random foveation with deterministic uniform coverage of the image, you may have done better.