Near Optimal Adversarial Attack on UCB Bandits
We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.
Aug-21-2020
- Genre:
- Research Report (0.65)
- Industry:
- Government > Military (0.74)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.64)
- Information Technology > Security & Privacy (0.74)
- Technology:
- Information Technology
- Artificial Intelligence (0.47)
- Data Science > Data Mining
- Big Data (0.35)
- Security & Privacy (0.74)
- Information Technology