Image-Caption Encoding for Improving Zero-Shot Generalization