Plotting

 Pang, Yanwei


Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning

Neural Information Processing Systems

Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions. However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. Besides, different regions contain distinct discriminative information. The important regions should contribute more to the prediction. To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions. Feeding both the integrated visual features and the class semantic features into a multi-class classification architecture, the proposed framework can be trained end-to-end. Extensive experimental results on CUB and NABird datasets show that the proposed approach has a consistent improvement on both fine-grained zero-shot classification and retrieval tasks.


Simultaneously Learning Neighborship and Projection Matrix for Supervised Dimensionality Reduction

arXiv.org Machine Learning

Explicitly or implicitly, most of dimensionality reduction methods need to determine which samples are neighbors and the similarity between the neighbors in the original highdimensional space. The projection matrix is then learned on the assumption that the neighborhood information (e.g., the similarity) is known and fixed prior to learning. However, it is difficult to precisely measure the intrinsic similarity of samples in high-dimensional space because of the curse of dimensionality. Consequently, the neighbors selected according to such similarity might and the projection matrix obtained according to such similarity and neighbors are not optimal in the sense of classification and generalization. To overcome the drawbacks, in this paper we propose to let the similarity and neighbors be variables and model them in low-dimensional space. Both the optimal similarity and projection matrix are obtained by minimizing a unified objective function. Nonnegative and sum-to-one constraints on the similarity are adopted. Instead of empirically setting the regularization parameter, we treat it as a variable to be optimized. It is interesting that the optimal regularization parameter is adaptive to the neighbors in low-dimensional space and has intuitive meaning. Experimental results on the YALE B, COIL-100, and MNIST datasets demonstrate the effectiveness of the proposed method.