Mobile Video Diffusion

Yahia, Haitam Ben, Korzhenkov, Denis, Lelekas, Ioannis, Ghodrati, Amir, Habibian, Amirhossein

Dec-10-2024–arXiv.org Artificial Intelligence

Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-diffusion/

diffusion model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Dec-10-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Information Technology (0.93)

Technology:
- Information Technology
  - Communications (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Vision (0.93)
    - Natural Language (0.93)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found