SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Open in new window