Flexible Diffusion Modeling of Long Videos

Harvey, William, Naderiparizi, Saeid, Masrani, Vaden, Weilbach, Christian, Wood, Frank

Dec-15-2022–arXiv.org Artificial Intelligence

We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

Dec-15-2022

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.14)
  - British Columbia > Metro Vancouver Regional District
    - Vancouver (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.48)
- Transportation > Ground
  - Road (0.34)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found