The Rise of AI-Generated Content in Wikipedia
Brooks, Creston, Eggert, Samuel, Peskoff, Denis
–arXiv.org Artificial Intelligence
The rise of AI-generated content in popular information sources raises significant concerns about accountability, accuracy, and bias amplification. Beyond directly impacting consumers, the widespread presence of this content poses questions for the long-term viability of training language models on vast internet sweeps. We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. Both detectors reveal a marked increase in AI-generated content in recent pages compared to those from before the release of GPT-3.5. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics.
arXiv.org Artificial Intelligence
Oct-10-2024
- Country:
- Africa > Ghana (0.04)
- Asia
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.04)
- India (0.04)
- Middle East
- Turkmenistan > Ahal Region
- Ashgabat (0.04)
- Bangladesh > Dhaka Division
- Europe
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- United Kingdom (0.04)
- Middle East > Malta
- North America
- Belize (0.15)
- United States (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Government (1.00)
- Law (0.93)
- Technology: