Going Beyond Nouns With Vision & Language Models Using Synthetic Data

Open in new window