It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data

Open in new window