Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline