On the Benefits of Early Fusion in Multimodal Representation Learning

Open in new window