Tiny alterations in training data can introduce "backdoors" into machine learning models

#artificialintelligence 

In TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents, a group of Boston University researchers demonstrate an attack on machine learning systems trained with "reinforcement learning" in which ML systems derive solutions to complex problems by iteratively trying multiple solutions. The attack is related to adversarial examples, a class of attacks that involve probing a machine-learning model to find "blind spots" -- very small changes (usually imperceptible to humans) that cause machine learning classifiers' accuracy to shelve off rapidly (for example, a small change to a model of a gun can make an otherwise reliable classifier think it's looking at a helicopter). It's not clear whether it's possible to create a machine learning model that's immune to adversarial examples (the expert I trust most on this told me off the record that they think it's not), but what the researchers behind Trojdrl propose is a method for deliberately introducing adversarial examples by slipping difficult-to-spot changes into training data, which will produce defects in the eventual model that can serve as a "backdoor" that future adversaries can exploit. Training data sets are often ad-hoc in nature; they're so large that it's hard to create version-by-version snapshots, and they're also so prone to mislabeling that researchers are always making changes to them in order to improve their accuracy. All of this suggests that poisoning training data might be easier than it sounds.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found