Alignbefore Fuse: Visionand Language Representation Learningwith Momentum Distillation