wikipedian
How AI and Wikipedia have sent vulnerable languages into a doom spiral
Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages? When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia four years ago, his first act was to delete almost everything. It had to go, he thought, if it had any chance of surviving. Wehr, who's 26, isn't from Greenland--he grew up in Germany--but he had become obsessed with the island, an autonomous Danish territory, after visiting as a teenager. He'd spent years writing obscure Wikipedia articles in his native tongue on virtually everything to do with it. He even ended up moving to Copenhagen to study Greenlandic, a language spoken by some 57,000 mostly Indigenous Inuit people scattered across dozens of far-flung Arctic villages. The Greenlandic-language edition was added to Wikipedia around 2003, just a few years after the site launched in English. By the time Wehr took its helm nearly 20 years later, hundreds of Wikipedians had contributed to it and had collectively written some 1,500 articles totaling over tens of thousands of words.
- North America > Greenland (0.24)
- Europe > Germany (0.24)
- Europe > Denmark > Capital Region > Copenhagen (0.24)
- (8 more...)
Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia
Kuo, Tzu-Sheng, Halfaker, Aaron, Cheng, Zirui, Kim, Jiwoo, Wu, Meng-Hsin, Wu, Tongshuang, Holstein, Kenneth, Zhu, Haiyi
AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipedia, an online community with multiple AI-based content moderation tools deployed. We introduce Wikibench, a system that enables communities to collaboratively curate AI evaluation datasets, while navigating ambiguities and differences in perspective through discussion. A field study on Wikipedia shows that datasets curated using Wikibench can effectively capture community consensus, disagreement, and uncertainty. Furthermore, study participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. Based on our findings, we propose future directions for systems that support community-driven data curation.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
- North America > United States > Virginia (0.04)
- (14 more...)
Wikipedia Will Survive A.I.
Welcome to Source Notes, a Future Tense column about the internet's information ecosystem. Wikipedia is, to date, the largest and most-read reference work in human history. But the editors who update and maintain Wikipedia are certainly not complacent about its place as the preeminent information resource, and are worried about how it might be displaced by generative A.I. At last week's Wikimania, the site's annual user conference, one of the sessions was "ChatGPT vs. WikiGPT," and a panelist at the event mentioned that rather than visiting Wikipedia, people seem to being going to ChatGPT for their information needs. Veteran Wikipedians have couched ChatGPT as an existential threat, predicting that A.I. chatbots will supplant Wikipedia in the same way that Wikipedia infamously dethroned Encyclopedia Britannica back in 2005.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)
Should ChatGPT Be Used to Write Wikipedia Articles?
Welcome to Source Notes, a Future Tense column about the internet's information ecosystem. Five years ago, I traveled to Stockholm to cover the annual convention for Wikipedia and related free knowledge projects. But it was not just wiki-interviews and chewy candy fish that occupied my time among the Swedes. During one fun evening, I came across a group playing a tabletop game envisioning what Wikipedia would be like in 2035. This futuristic Dungeons & Dragons-style role-playing game featured a cast of diverse characters like Yuki, an A.I. pop music composer and Wikipedia writer, and Levi, a passionate neo-Luddite who believed Wikipedia should be composed by humans only.
- North America > United States (0.70)
- Europe > Sweden > Stockholm > Stockholm (0.25)
- Leisure & Entertainment (0.89)
- Media > Music (0.55)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
China and Taiwan clash over Wikipedia edits
Ask Google or Siri: "What is Taiwan?" "A state", they will answer, "in East Asia". But earlier in September, it would have been a "province in the People's Republic of China". And Wikipedia had suddenly changed. The edit was reversed, but soon made again. It became an editorial tug of war that - as far as the encyclopedia was concerned - caused the state of Taiwan to constantly blink in and out of existence over the course of a single day.
Defense Mechanism or Socialization Tactic? Improving Wikipedia’s Notifications to Rejected Contributors
Geiger, R. Stuart (University of California, Berkeley) | Halfaker, Aaron (University of Minnesota) | Pinchuk, Maryana (Wikimedia Foundation) | Walling, Steven (Wikimedia Foundation)
Unlike traditional firms, open collaborative systems rely on volunteers to operate, and many communities struggle to maintain enough contributors to ensure the quality and quantity of content. However, Wikipedia has historically faced the exact opposite problem: too much participation, particularly from users who, knowingly or not, do not share the same norms as veteran Wikipedians. During its period of exponential growth, the Wikipedian community developed specialized socio-technical defense mechanisms to protect itself from the negatives of massive participation: spam, vandalism, falsehoods, and other damage. Yet recently, Wikipedia has faced a number of high-profile issues with recruiting and retaining new contributors. In this paper, we first illustrate and describe the various defense mechanisms at work in Wikipedia, which we hypothesize are inhibiting newcomer retention. Next, we present results from an experiment aimed at increasing both the quantity and quality of editors by altering various elements of these defense mechanisms, specifically pre-scripted warnings and notifications that are sent to new editors upon reverting or rejecting contributions. Using logistic regressions to model new user activity, we show which tactics work best for different populations of users based on their motivations when joining Wikipedia. In particular, we found that personalized messages in which Wikipedians identified themselves in active voice and took direct responsibility for rejecting an editor’s contributions were much more successful across a variety of outcome metrics than the current messages, which typically use an institutional and passive voice.
- North America > United States > Minnesota (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.88)