Masked Contrastive Pre-Training for Efficient Video-Text Retrieval