TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

Open in new window