Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Rakhsha, Amin, Zhang, Xuezhou, Zhu, Xiaojin, Singla, Adish

Feb-16-2021–arXiv.org Artificial Intelligence

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori. That is, our attack makes minimum assumptions on the prior knowledge of the adversary: it has no initial knowledge of the environment or the learner, and neither does it observe the learner's internal mechanism except for its performed actions. We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the state-of-the-art white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.

artificial intelligence, learner, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-16-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > Wisconsin (0.14)

Genre:
- Research Report (0.40)

Industry:
- Government (0.69)
- Information Technology > Security & Privacy (0.93)
- Transportation (0.89)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found