Improving Visual Commonsense in Language Models via Multiple Image Generation

Open in new window