LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Open in new window