A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning