LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation
Cen, Kangrui, Zhao, Baixuan, Xin, Yi, Luo, Siqi, Zhai, Guangtao, Liu, Xiaohong
–arXiv.org Artificial Intelligence
Controlling object motion trajectories in T ext-to-Video (T2V) generation is a challenging and relatively under-explored area, particularly in scenarios involving multiple moving objects. Most community models and datasets in the T2V domain are designed for single-object motion, limiting the performance of current generative models in multi-object tasks. Additionally, existing motion control methods in T2V either lack support for multi-object motion scenes or experience severe performance degradation when object trajectories intersect, primarily due to the semantic conflicts in colliding regions. T o address these limitations, we introduce LayerT2V, the first approach for generating video by compositing background and foreground objects layer by layer . This layered generation enables flexible integration of multiple independent elements within a video, positioning each element on a distinct "layer" and thus facilitating coherent multi-object synthesis while enhancing control over the generation process. Extensive experiments demonstrate the superiority of LayerT2V in generating complex multi-object scenarios, showcasing 1.4 and 4.5 improvements in mIoU and AP50 metrics over state-of-the-art (SOTA) methods.
arXiv.org Artificial Intelligence
Aug-7-2025
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.93)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence