Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation
Maduabuchi, Chika, Chen, Hao, Han, Yujin, Wang, Jindong
–arXiv.org Artificial Intelligence
Latent Video Diffusion Models (LVDMs) achieve high-quality generation but are sensitive to imperfect conditioning, which causes semantic drift and temporal incoherence on noisy, web-scale video-text datasets. We introduce CAT-LVDM, the first corruption-aware training framework for LVDMs that improves robustness through structured, data-aligned noise injection. Our method includes Batch-Centered Noise Injection (BCNI), which perturbs embeddings along intra-batch semantic directions to preserve temporal consistency. BCNI is especially effective on caption-rich datasets like WebVid-2M, MSR-VTT, and MSVD. We also propose Spectrum-Aware Contextual Noise (SACN), which injects noise along dominant spectral directions to improve low-frequency smoothness, showing strong results on UCF-101. On average, BCNI reduces FVD by 31.9% across WebVid-2M, MSR-VTT, and MSVD, while SACN yields a 12.3% improvement on UCF-101. Ablation studies confirm the benefit of low-rank, data-aligned noise. Our theoretical analysis further explains how such perturbations tighten entropy, Wasserstein, score-drift, mixing-time, and generalization bounds. CAT-LVDM establishes a principled, scalable training approach for robust video diffusion under multimodal noise. Code and models: https://github.com/chikap421/catlvdm
arXiv.org Artificial Intelligence
May-29-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- India
- Rajasthan (0.04)
- West Bengal > Kolkata (0.04)
- Japan > Honshū
- Chūbu > Nagano Prefecture
- Nagano (0.04)
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Chūbu > Nagano Prefecture
- Middle East > Jordan (0.14)
- Atlantic Ocean > Caribbean Sea (0.04)
- Europe
- France > Hauts-de-France
- Germany > Hesse
- Darmstadt Region > Frankfurt (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- North Sea > Southern North Sea (0.04)
- England
- North America
- Curaçao (0.04)
- United States
- Michigan (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.45)
- Statistical Learning (0.67)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence