Towards Understanding the Effect of Pretraining Label Granularity