SupplementaryMaterial-WikiDO: ANewBenchmarkEvaluatingCross-ModalRetrieval forVision-LanguageModels