TULIP: Towards Unified Language-Image Pretraining