Generative AI Degrades Online Communities

Feb-16-2024, 17:40:18 GMT–Communications of the ACM

ChatGPT generates believable text about nearly any subject, but there is a big difference between "believable" and "correct." ChatGPT, similarly to other LLMs, is trained on large swaths of publicly available data, in large part scraped from online forums such as Stack Overflow and Reddit. Given differences in the volume of available data, ChatGPT's performance naturally varies by topic and may in turn affect communities to different degrees. We observed ChatGPT's impact on Stack Overflow participation varies significantly across topics, aligning with its expected performance based on available training data. Those topics related to open-source tools and general-purpose programming languages (for example, Python, R) appeared to experience larger declines in participation and contribution than proprietary and closed technologies, such as those employed for enterprise server-side development (for example, Spring Framework, AWS, Azure).

available data, chatgpt, generative ai degrade online community

Communications of the ACM

Feb-16-2024, 17:40:18 GMT

Journals Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.44)