Near Optimal Adversarial Attack on UCB Bandits

Aug-21-2020–arXiv.org Machine Learning

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.

artificial intelligence, big data, data mining, (16 more...)

arXiv.org Machine Learning

Aug-21-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois (0.04)

Genre:
- Research Report (0.65)

Industry:
- Information Technology > Security & Privacy (0.74)
- Government > Military (0.74)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.64)

Technology:
- Information Technology
  - Security & Privacy (0.74)
  - Artificial Intelligence (0.47)
  - Data Science > Data Mining
    - Big Data (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found