Data leakage in cross-modal retrieval training: A case study