Long-Short Temporal Contrastive Learning of Video Transformers