Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models

Open in new window