Is Large-Scale Pretraining the Secret to Good Domain Generalization?