AI Weekly: The challenges of creating open source AI training datasets

#artificialintelligence 

Indeed, creating AI training datasets in a privacy-preserving, ethical way remains a major blocker for researchers in the AI community, particularly those who specialize in computer vision. In January 2019, IBM released a corpus designed to mitigate bias in facial recognition algorithms that contained nearly a million photos of people from Flickr. But neither the photographers nor the subjects of the photos were notified by IBM that their work would be included. Separately, an earlier version of ImageNet, a dataset used to train AI systems around the world, was found to contain photos of naked children, porn actresses, college parties, and more -- all scraped from the web without those individuals' consent. "There are real harms that have emerged from casual repurposing, open-sourcing, collecting, and scraping of biometric data," said Liz O'Sullivan, cofounder and technology director at the Surveillance Technology Oversight Project, a nonprofit organization litigating and advocating for privacy.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found