Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images