Self-supervised Object-Centric Learning for Videos Görkay Aydemir
–Neural Information Processing Systems
From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens in the feature space for efficiency and regularization. Additionally, we address over-clustering by merging slots based on similarity.
Neural Information Processing Systems
Feb-13-2026, 03:01:24 GMT
- Genre:
- Research Report > New Finding (0.68)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.46)
- Statistical Learning (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence