Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering
Cai, Zhongteng, Wang, Yaxuan, Liu, Yang, Zhang, Xueru
–arXiv.org Artificial Intelligence
As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training instability or \textit{model collapse}. Common strategies to address the issue -- such as accumulating historical training data or injecting fresh real data -- either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose \textit{Latent Space Filtering} (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Asia > Japan
- Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Europe > Germany
- Baden-Württemberg > Karlsruhe Region
- Heidelberg (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Baden-Württemberg > Karlsruhe Region
- North America > United States
- California > Santa Cruz County
- Santa Cruz (0.04)
- Ohio (0.04)
- California > Santa Cruz County
- Asia > Japan
- Genre:
- Research Report
- New Finding (0.46)
- Promising Solution (0.34)
- Research Report
- Technology: