Goto

Collaborating Authors

 wikimedia


Wikipedia is struggling with voracious AI bot crawlers

Engadget

Wikimedia has seen a 50 percent increase in bandwidth used for downloading multimedia content since January 2024, the foundation said in an update. But it's not because human readers have suddenly developed a voracious appetite for consuming Wikipedia articles and for watching videos or downloading files from Wikimedia Commons. No, the spike in usage came from AI crawlers, or automated programs scraping Wikimedia's openly licensed images, videos, articles and other files to train generative artificial intelligence models. This sudden increase in traffic from bots could slow down access to Wikimedia's pages and assets, especially during high-interest events. When Jimmy Carter died in December, for instance, people's heightened interest in the video of his presidential debate with Ronald Reagan caused slow page load times for some users.


Wikimedia's CTO: In the age of AI, human contributors still matter

MIT Technology Review

It is undeniable that technological advances and cultural shifts have transformed our online universe over the years--especially with the recent surge in AI-generated content--but Deckelmann still isn't afraid of people on the internet. She believes they are its future. In the summer of 2022, when she stepped into the newly created role of CPTO, Deckelmann didn't know that a few months later, the race to build generative AI would accelerate to a breakneck pace. With the release of OpenAI's ChatGPT and other large language models, and the multibillion-dollar funding cycle that followed, 2023 became the year of the chatbot. And because these models require heaps of cheap (or, preferably, even free) content to function, Wikipedia's tens of millions of articles have become a rich source of fuel. To anyone who's spent time on the internet, it makes sense that bots and bot builders would look to Wikipedia to strengthen their own knowledge collections.


Fair multilingual vandalism detection system for Wikipedia

Trokhymovych, Mykola, Aslam, Muniza, Chou, Ai-Jou, Baeza-Yates, Ricardo, Saez-Trumper, Diego

arXiv.org Artificial Intelligence

This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system was evaluated through comparison with the one used in production in Wikipedia, known as ORES. Our research results in a significant increase in the number of languages covered, making Wikipedia patrolling more efficient to a wider range of communities. Furthermore, our model outperforms ORES, ensuring that the results provided are not only more accurate but also less biased against certain groups of contributors.


Creating Synthetic Data for Machine Learning

#artificialintelligence

We start with some imports I am using PIL (pillow) in order to create the images and pascal (PascalVoc) in order to save the information as annotations. I downloaded a few images of orange trees from the web and started to sample pixels.


What are RNNs and LSTMs in Deep Learning?

#artificialintelligence

Many of the most impressive advances in natural language processing and AI chatbots are driven by Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. RNNs and LSTMs are special neural network architectures that are able to process sequential data, data where chronological ordering matters. LSTMs are essentially improved versions of RNNs, capable of interpreting longer sequences of data. Let's take a look at how RNNs and LSTMS are structured and how they enable the creation of sophisticated natural language processing systems. So before we talk about how Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) work, we should discuss the format of a neural network in general.


Amazon Owes Wikipedia Big-Time

Slate

When you ask Amazon's Alexa, "What is Wikipedia?" Alexa took this line directly from Wikipedia's entry on Wikipedia, as it does with many of its answers. Perhaps what it should have said was this: "Wikipedia is the source from which I take much of my information, without credit, contribution, or compensation." Amazon recently donated $1 million to the Wikimedia Endowment, a fund that keeps Wikipedia running, as "part of Amazon's and CEO Jeff Bezos' growing work in philanthropy," according to CNET. It's being framed as a "gift," one that--as Amazon puts it--recognizes their shared vision to "make it easier to share knowledge globally."



Some (Exciting) News – caroline sinders – Medium

#artificialintelligence

It's a weird thing to write: when you've left your first dream job (IBM Watson) for a dream opportunity (a residency with BuzzFeed and Eyebeam) and now….you're I'm incredibly excited to announce starting in April I will be a researcher and designer specifically for online harassment at Wikimedia. And, I'll be working specifically with a team using machine learning. It sounds eerily like my fellowship proposal, doesn't it? Okay, it's not a weird thing to write, it's an extremely amazing, fantastic, and humbling opportunity.


Former Astronaut Dan Barry: How to Meet an Alien (AI)

#artificialintelligence

"We have enough human brains, I have no interest in replicating the human brain." For some individuals, an artificial brain represents a chance to fully understand one's own nature. For others, the artificial mind represents self-improvement, an opportunity to make their human brain faster, smarter, less forgetful, geared towards long-term thinking. Longevity-minded individuals find the allure of artificial intelligence to be linked to the promise of a brain that can transcend the fragile mortality found in the biological models. Optimists pitch artificial intelligence to the world as our ultimate servant, the benevolent, rational being that only wants to help.