Video Occupancy Models

Tomar, Manan, Hansen-Estruch, Philippe, Bachman, Philip, Lamb, Alex, Langford, John, Taylor, Matthew E., Levine, Sergey

Jun-25-2024–arXiv.org Artificial Intelligence

We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control.

prediction, representation, representation space, (14 more...)

arXiv.org Artificial Intelligence

Jun-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.69)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found