Generative AI Degrades Online Communities

Communications of the ACM 

ChatGPT generates believable text about nearly any subject, but there is a big difference between "believable" and "correct." ChatGPT, similarly to other LLMs, is trained on large swaths of publicly available data, in large part scraped from online forums such as Stack Overflow and Reddit. Given differences in the volume of available data, ChatGPT's performance naturally varies by topic and may in turn affect communities to different degrees. We observed ChatGPT's impact on Stack Overflow participation varies significantly across topics, aligning with its expected performance based on available training data. Those topics related to open-source tools and general-purpose programming languages (for example, Python, R) appeared to experience larger declines in participation and contribution than proprietary and closed technologies, such as those employed for enterprise server-side development (for example, Spring Framework, AWS, Azure).