Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers

Open in new window