Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation
Cheng, Shihan, Kulkarni, Nilesh, Hyde, David, Smirnov, Dmitriy
–arXiv.org Artificial Intelligence
Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learns these controls from sparse, low-quality synthetic data. W e show that not only does fine-tuning on such simple data enable the desired controls, it actually yields superior results to models fine-tuned on pho-torealistic "real" data. Beyond demonstrating these results, we provide a framework that justifies this phenomenon both intuitively and quantitatively.
arXiv.org Artificial Intelligence
Dec-12-2025
- Country:
- Asia (0.04)
- North America > Mexico
- Gulf of Mexico (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Media > Photography (0.90)
- Technology: