TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment

Open in new window