Nepotistically Trained Generative-AI Models Collapse
–arXiv.org Artificial Intelligence
From text to audio and image, today's generative-AI systems are trained on large quantities of human-generated content. Most of this content is obtained by scraping a variety of online sources. As generative AI becomes more common, it is reasonable to expect that future data scraping will invariably catch generative AI's own creations. We ask what happens when these generative systems are trained on varying combinations of human-generated and AI-generated content. Although it is early in the evolution of generative AI, there is already some evidence that retraining a generative AI model on its own creation - what we call model poisoning - leads to a range of artifacts in the output of the newly trained model. It has been shown, for example, that when retrained on their own output, large language models (LLMs) contain irreversible defects that cause the model to produce gibberish - so-called model collapse [22].
arXiv.org Artificial Intelligence
Nov-20-2023
- Country:
- North America > United States
- California
- Santa Clara County > Palo Alto (0.04)
- Alameda County > Berkeley (0.04)
- California
- Asia
- Middle East > Republic of Türkiye
- Karaman Province > Karaman (0.04)
- Japan > Honshū
- Chūbu > Nagano Prefecture > Nagano (0.04)
- Middle East > Republic of Türkiye
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (0.69)
- Leisure & Entertainment > Sports (0.47)
- Technology: