HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval