Reviews: Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning

Neural Information Processing Systems 

Summary This paper presents a stacked semantics-guided attention (S2GA) model for improved zero-shot learning. The main idea of this paper is that important regions should contribute more to the prediction. To this end, the authors design an attention method to distribute different weights for different regions according to their relevance with class semantic features and integrate both the global visual features and the weighted region features into more semantics-relevant features to represent images. Strengths The method is well motivated. The presentation of the method is clear. Using stacked attention for zero-shot learning seems to be a new idea (I do not check exhaustively).