Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers