AI Weekly: The challenges of creating open source AI training datasets

Mar-4-2021, 06:10:40 GMT–#artificialintelligence

Indeed, creating AI training datasets in a privacy-preserving, ethical way remains a major blocker for researchers in the AI community, particularly those who specialize in computer vision. In January 2019, IBM released a corpus designed to mitigate bias in facial recognition algorithms that contained nearly a million photos of people from Flickr. But neither the photographers nor the subjects of the photos were notified by IBM that their work would be included. Separately, an earlier version of ImageNet, a dataset used to train AI systems around the world, was found to contain photos of naked children, porn actresses, college parties, and more -- all scraped from the web without those individuals' consent. "There are real harms that have emerged from casual repurposing, open-sourcing, collecting, and scraping of biometric data," said Liz O'Sullivan, cofounder and technology director at the Surveillance Technology Oversight Project, a nonprofit organization litigating and advocating for privacy.

ai training dataset, dataset, training dataset, (6 more...)

#artificialintelligence

Mar-4-2021, 06:10:40 GMT

News Web Page

Add feedback

Country:
- North America > United States > Arizona (0.06)

Genre:
- Research Report (0.35)

Industry:
- Information Technology > Security & Privacy (0.57)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.91)
  - Vision (0.59)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found