WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

Open in new window