Supplemental Material for Learning with Noisy Correspondence for Cross-modal Matching