Static and Dynamic Concepts for Self-supervised Video Representation Learning