K-L ITE: Learning Transferable Visual Models with External Knowledge

Neural Information Processing Systems 

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, due to the broad concept coverage achieved via largescale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages a much more structured source of supervision and offers sample efficiency.