Does Visual Pretraining Help End-to-End Reasoning?