e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce