Moving Off-the-Grid: Scene-Grounded Video Representations, Yi Yang

Mar-27-2025, 12:07:31 GMT–Neural Information Processing Systems

Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific spatio(-temporal) location. In this work we present Moving Off-the-Grid (MooG), a self-supervised video representation model that offers an alternative approach, allowing tokens to move "off-the-grid" to better enable them to represent scene elements consistently, even as they move across the image plane through time.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Mar-27-2025, 12:07:31 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Energy > Power Industry (0.81)
- Information Technology (0.92)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.93)
    - Natural Language > Large Language Model (0.67)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found