Multi-modal Synthetic Data Training and Model Collapse: Insights from VLMs and Diffusion Models
Hu, Zizhao, Rostami, Mohammad, Thomason, Jesse
–arXiv.org Artificial Intelligence
Recent research has highlighted the risk of generative model collapse, where performance progressively degrades when continually trained on self-generated data. However, existing exploration on model collapse is limited to single, unimodal models, limiting our understanding in more realistic scenarios, such as diverse multi-modal AI agents interacting autonomously through synthetic data and continually evolving. We expand the synthetic data training and model collapse study to multi-modal vision-language generative systems, such as vision-language models (VLMs) and text-to-image diffusion models, as well as recursive generate-train loops with multiple models. We find that model collapse, previously observed in single-modality generative models, exhibits distinct characteristics in the multi-modal context, such as improved vision-language alignment and increased variance in VLM image-captioning task. Additionally, we find that general approaches such as increased decoding budgets, greater model diversity, and relabeling with frozen models can effectively mitigate model collapse. Our findings provide initial insights and practical guidelines for reducing the risk of model collapse in self-improving multi-agent AI systems and curating robust multi-modal synthetic datasets.
arXiv.org Artificial Intelligence
May-15-2025
- Country:
- North America > United States
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- California > Los Angeles County
- Los Angeles (0.28)
- New Mexico > Bernalillo County
- Europe
- Monaco (0.04)
- Spain > Andalusia
- Granada Province > Granada (0.04)
- Latvia > Pārgauja Municipality
- Stalbe (0.04)
- Asia > Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Health & Medicine (0.68)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning > Agents (0.88)
- Natural Language
- Large Language Model (1.00)
- Generation (0.69)
- Machine Learning > Neural Networks
- Deep Learning > Generative AI (0.46)
- Information Technology > Artificial Intelligence