Robust Lipschitz Bandits to Adversarial Corruptions
Kang, Yue, Hsieh, Cho-Jui, Lee, Thomas C. M.
Multi-armed Bandit (MAB) [3] is a fundamental and powerful framework in sequential decisionmaking problems. Given the potential existence of malicious users in real-world scenarios [10], a recent line of works considers the stochastic bandit problem under adversarial corruptions: an agent adaptively updates its policy to choose an arm from the arm set, and an adversary may contaminate the reward generated from the stochastic bandit before the agent could observe it. To robustify bandit learning algorithms under adversarial corruptions, several algorithms have been developed in the setting of traditional MAB [16, 18, 29] and contextual linear bandits [6, 12, 17, 27, 39]. These works consider either the weak adversary [29], which has access to all past data but not the current action before choosing its attack, or the strong adversary [6], which is also aware of the current action for contamination. Details of these two adversaries will be elaborated in Section 3. In practice, bandits under adversarial corruptions can be used in many real-world problems such as pay-per-click advertising with click fraud and recommendation systems with fake reviews [29], and it has been empirically validated that stochastic MABs are vulnerable to slight corruption [12, 15, 18]. Although there has been extensive research on the adversarial robustness of stochastic bandits, most existing works consider problems with a discrete arm set, such as the traditional MAB and contextual linear bandit. In this paper, we investigate robust bandit algorithms against adversarial corruptions in the Lipschitz bandit setting, where a continuously infinite arm set lie in a known metric space with covering dimension d and the expected reward function is an unknown Lipschitz function.
Oct-8-2023