I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

Neural Information Processing Systems 

Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name.