CHIP: Contrastive Hierarchical Image Pretraining