Goto

Collaborating Authors

 ai training dataset


Vaccine misinformation can easily poison AI – but there's a fix

New Scientist

Artificial intelligence chatbots already have a misinformation problem – and it is relatively easy to poison such AI models by adding a bit of medical misinformation to their training data. Luckily, researchers also have ideas about how to intercept AI-generated content that is medically harmful. Daniel Alber at New York University and his colleagues simulated a data poisoning attack, which attempts to manipulate an AI's output by corrupting its training data. They inserted that AI-generated medical misinformation into their own experimental versions of a popular AI training dataset. Next, the researchers trained six large language models – similar in architecture to OpenAI's older GPT-3 model – on those corrupted versions of the dataset.


AI Weekly: The challenges of creating open source AI training datasets

#artificialintelligence

Indeed, creating AI training datasets in a privacy-preserving, ethical way remains a major blocker for researchers in the AI community, particularly those who specialize in computer vision. In January 2019, IBM released a corpus designed to mitigate bias in facial recognition algorithms that contained nearly a million photos of people from Flickr. But neither the photographers nor the subjects of the photos were notified by IBM that their work would be included. Separately, an earlier version of ImageNet, a dataset used to train AI systems around the world, was found to contain photos of naked children, porn actresses, college parties, and more -- all scraped from the web without those individuals' consent. "There are real harms that have emerged from casual repurposing, open-sourcing, collecting, and scraping of biometric data," said Liz O'Sullivan, cofounder and technology director at the Surveillance Technology Oversight Project, a nonprofit organization litigating and advocating for privacy.


Find Out If Your Photo Is In This AI Training Dataset

#artificialintelligence

Facial recognition systems are everywhere, from security cameras that try to spot criminals to the way Snapchat finds your face to put bunny ears on it. Computers need a lot of data to be able to learn how to recognize faces, and some of it comes from Flickr. IBM released a "Diversity in Faces" data set earlier this year, which in a way is arguably a good thing: a lot of early face-recognition algorithms were trained on thin white celebrities, because it's easy to find a lot of photos of celebrities. Your data source affects what your algorithm is able to do and understand, so there are a lot of racist, sexist algorithms out there. This dataset aims to help, by providing images of faces alongside data about the face such as skin color. But most folks who uploaded their personal snapshots to Flickr probably didn't realize that, years down the road, their faces and their friends' and families' faces could be used to train the next big mega-algorithm.


The Future of Ethics might be hanging on that #AI training dataset

#artificialintelligence

With algorithms playing an increasingly more important role in business transactions, from online retail to innovative brick-and-mortar; from structuring dispersed - and often not standardized - electronic health records, to diagnosing patients and connecting them with the right specialist; from autonomous vehicles deciding between saving the life of a passenger on-board or a pedestrian on a road side, many are warming up to the idea of an AI regulatory framework, which will never happen soon enough. But as the framework is far from being ready, companies should embrace an AI based not only on possibilities - what we can do - but also on ethical implication - what we should do not pursue. The importance is underscored by two examples, that made it to mainstream media: Amazon scrapping its HR-related AI project because it showed recruiting bias, and Equivant / Northpointe which had to kill their machine-learning for parole recommendation, because of wrong - biased - recommendations on prisoners. The risks should not underestimated. In an article on the MIT Sloan Review of August 2018, Davenport and Foutty identify seven attributes of AI-driven Leaders, or as I prefer to call them, of leaders in the era of AI.