Does Visual Pretraining Help End-to-End Reasoning?

Neural Information Processing Systems 

We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual abstraction (e.g.