Adaptive Cross-Modal Few-shot Learning

Chen Xing, Negar Rostamzadeh, Boris Oreshkin, Pedro O. O. Pinheiro

Neural Information Processing Systems 

Metric-based meta-learning techniques have successfully been applied to fewshot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning.