Coarse-to-fine Alignment Makes Better Speech-image Retrieval

Open in new window