Nepotistically Trained Generative-AI Models Collapse
–arXiv.org Artificial Intelligence
From text to audio and image, today's generative-AI systems are trained on large quantities of human-generated content. Most of this content is obtained by scraping a variety of online sources. As generative AI becomes more common, it is reasonable to expect that future data scraping will invariably catch generative AI's own creations. We ask what happens when these generative systems are trained on varying combinations of human-generated and AI-generated content. Although it is early in the evolution of generative AI, there is already some evidence that retraining a generative AI model on its own creation - what we call model poisoning - leads to a range of artifacts in the output of the newly trained model. It has been shown, for example, that when retrained on their own output, large language models (LLMs) contain irreversible defects that cause the model to produce gibberish - so-called model collapse [22].
arXiv.org Artificial Intelligence
Nov-20-2023
- Country:
- Asia
- Japan > Honshū
- Chūbu > Nagano Prefecture > Nagano (0.04)
- Middle East > Republic of Türkiye
- Karaman Province > Karaman (0.04)
- Japan > Honshū
- North America > United States
- California
- Alameda County > Berkeley (0.04)
- Santa Clara County > Palo Alto (0.04)
- California
- Asia
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (0.69)
- Leisure & Entertainment > Sports (0.47)
- Technology: