Achieving Cross Modal Generalization with Multimodal Unified Representation Y an Xia 1 Hai Huang
–Neural Information Processing Systems
During pre-training, we investigate various modality combinations, including audio-visual, audio-text, and the tri-modal combination of audio-visual-text.
Neural Information Processing Systems
Oct-9-2025, 07:16:38 GMT
- Country:
- Asia
- China > Shanghai
- Shanghai (0.04)
- Middle East > Israel (0.04)
- China > Shanghai
- Europe
- Netherlands > North Holland
- Amsterdam (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Netherlands > North Holland
- Asia
- Technology: