VideoTitans: Scalable Video Prediction with Integrated Short-and Long-term Memory

Jun-19-2026, 19:01:53 GMT–Neural Information Processing Systems

Accurate video forecasting enables autonomous vehicles to anticipate hazards, robotics and surveillance systems to predict human intent, and environmental models to issue timely warnings for extreme weather events. However, existing methods remain limited: transformers rely on global attention with quadratic complexity, making them impractical for high-resolution, long-horizon video prediction, while convolutional and recurrent networks suffer from short-range receptive fields and vanishing gradients, losing key information over extended sequences. To overcome these challenges, we introduce VideoTitans, the first architecture to adapt the gradient-driven Titans memory--originally designed for language modelling to video prediction. VideoTitans integrates three core ideas: (i) a sliding-window attention core that scales linearly with sequence length and spatial resolution, (ii) an episodic memory that dynamically retains only informative tokens based on a gradient-based surprise signal, and (iii) a small set of persistent tokens encoding task-specific priors that stabilize training and enhance generalization.

data mining, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-19-2026, 19:01:53 GMT

Conferences PDF

Add feedback

Genre:
- Overview (0.68)
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.68)

Industry:
- Information Technology > Security & Privacy (0.48)
- Health & Medicine > Consumer Health (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Robots (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Cognitive Science (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found