Self-supervised Object-Centric Learning for Videos Görkay Aydemir

Neural Information Processing Systems 

From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens in the feature space for efficiency and regularization. Additionally, we address over-clustering by merging slots based on similarity.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found