Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering

Open in new window