Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

Neural Information Processing Systems 

Specifically, we introduce Multiway Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found