Enabling Versatile Controls for Video Diffusion Models

Zhang, Xu, Zhou, Hao, Qin, Haoming, Lu, Xiaobin, Yan, Jiaxing, Wang, Guanzhong, Chen, Zeyu, Liu, Yi

Mar-21-2025–arXiv.org Artificial Intelligence

Additionally, we design a unified control signal encoding pipeline and a sparse Despite substantial progress in text-to-video generation, residual connection mechanism to efficiently incorporate achieving precise and flexible control over fine-grained control representations. Comprehensive experiments and spatiotemporal attributes remains a significant unresolved human evaluations demonstrate that VCtrl effectively enhances challenge in video generation research. To address these controllability and generation quality. The source limitations, we introduce VCtrl (also termed PP-VCtrl), a code and pre-trained models are publicly available and implemented novel framework designed to enable fine-grained control using the PaddlePaddle framework at https:// over pre-trained video diffusion models in a unified manner.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Mar-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Quebec (0.14)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Vision (1.00)