Object-Centric Representation Learning with Generative Spatial-Temporal Factorization