Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure