LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation

Cen, Kangrui, Zhao, Baixuan, Xin, Yi, Luo, Siqi, Zhai, Guangtao, Liu, Xiaohong

Aug-7-2025–arXiv.org Artificial Intelligence

Controlling object motion trajectories in T ext-to-Video (T2V) generation is a challenging and relatively under-explored area, particularly in scenarios involving multiple moving objects. Most community models and datasets in the T2V domain are designed for single-object motion, limiting the performance of current generative models in multi-object tasks. Additionally, existing motion control methods in T2V either lack support for multi-object motion scenes or experience severe performance degradation when object trajectories intersect, primarily due to the semantic conflicts in colliding regions. T o address these limitations, we introduce LayerT2V, the first approach for generating video by compositing background and foreground objects layer by layer . This layered generation enables flexible integration of multiple independent elements within a video, positioning each element on a distinct "layer" and thus facilitating coherent multi-object synthesis while enhancing control over the generation process. Extensive experiments demonstrate the superiority of LayerT2V in generating complex multi-object scenarios, showcasing 1.4 and 4.5 improvements in mIoU and AP50 metrics over state-of-the-art (SOTA) methods.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-7-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found