Visual Representation Learning with Stochastic Frame Prediction