Self-Supervised Video Representation Learning via Latent Time Navigation