Self-supervised Video Representation Learning by Context and Motion Decoupling