SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Namekata, Koichi, Bahmani, Sherwin, Wu, Ziyi, Kant, Yash, Gilitschenski, Igor, Lindell, David B.
–arXiv.org Artificial Intelligence
Methods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided$\unicode{x2013}$offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while significantly narrowing down the performance gap with supervised models in terms of visual quality and motion fidelity.
arXiv.org Artificial Intelligence
Dec-4-2024
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Media
- Film (0.35)
- Photography (0.49)
- Television (0.35)
- Media
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.94)
- Natural Language > Large Language Model (0.70)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence