Contrastive Localized Language-Image Pre-Training