Bridging Vision and Language Spaces with Assignment Prediction

Open in new window