Nepotistically Trained Generative-AI Models Collapse

Nov-20-2023–arXiv.org Artificial Intelligence

From text to audio and image, today's generative-AI systems are trained on large quantities of human-generated content. Most of this content is obtained by scraping a variety of online sources. As generative AI becomes more common, it is reasonable to expect that future data scraping will invariably catch generative AI's own creations. We ask what happens when these generative systems are trained on varying combinations of human-generated and AI-generated content. Although it is early in the evolution of generative AI, there is already some evidence that retraining a generative AI model on its own creation - what we call model poisoning - leads to a range of artifacts in the output of the newly trained model. It has been shown, for example, that when retrained on their own output, large language models (LLMs) contain irreversible defects that cause the model to produce gibberish - so-called model collapse [22].

iteration, poisoning, real image, (17 more...)

arXiv.org Artificial Intelligence

Nov-20-2023

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū
    - Chūbu > Nagano Prefecture > Nagano (0.04)
  - Middle East > Republic of Türkiye
    - Karaman Province > Karaman (0.04)
- North America > United States
  - California
    - Alameda County > Berkeley (0.04)
    - Santa Clara County > Palo Alto (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.69)
- Leisure & Entertainment > Sports (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (1.00)
  - Natural Language > Generation (1.00)