DINO-Foresight: Looking into the Future with DINO
–Neural Information Processing Systems
Predicting future dynamics is crucial for applications like autonomous driving and robotics, where understanding the environment is key. Existing pixel-level methods are computationally expensive and often focus on irrelevant details. To address these challenges, we introduce DINO-Foresight, a novel framework that operates in the semantic feature space of pretrained Vision Foundation Models (VFMs). Our approach trains a masked feature transformer in a self-supervised manner to predict the evolution of VFM features over time. By forecasting these features, we can apply off-the-shelf, task-specific heads for various scene understanding tasks. In this framework, VFM features are treated as a latent space, to which different heads attach to perform specific tasks for future-frame analysis. Extensive experiments show the very strong performance, robustness and scalability of our framework.
Neural Information Processing Systems
Jun-23-2026, 02:04:09 GMT
- Country:
- Europe (0.28)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
- Road (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Robots (0.86)
- Machine Learning
- Neural Networks (1.00)
- Statistical Learning (0.68)
- Information Technology > Artificial Intelligence