Achieving Cross Modal Generalization with Multimodal Unified Representation Y an Xia 1 Hai Huang

Neural Information Processing Systems 

During pre-training, we investigate various modality combinations, including audio-visual, audio-text, and the tri-modal combination of audio-visual-text.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found