Moving Off-the-Grid: Scene-Grounded Video Representations, Yi Yang
–Neural Information Processing Systems
Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific spatio(-temporal) location. In this work we present Moving Off-the-Grid (MooG), a self-supervised video representation model that offers an alternative approach, allowing tokens to move "off-the-grid" to better enable them to represent scene elements consistently, even as they move across the image plane through time.
Neural Information Processing Systems
Mar-27-2025, 12:07:31 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Energy > Power Industry (0.81)
- Information Technology (0.92)
- Technology: