Does Visual Pretraining Help End-to-End Reasoning?

Open in new window