Stochastic Linear Optimization with Adversarial Corruption

Li, Yingkai, Lou, Edmund Y., Shan, Liren

Sep-4-2019–arXiv.org Machine Learning

The multi-armed bandit problem has been extensively studie d in computer science, operations research and economics since the seminal work of Robb ins (1952). It is a model designed for sequential decision-making in which a player c hooses at each time step amongst a finite set of available arms and receives a reward for the cho sen decision. The player's objective is to minimize the difference, called regret, betwee n the rewards she receives and the rewards accumulated by the best arm. The rewards of each arm i s drawn from a probability distribution in the stochastic multi-armed bandit problem; but in adversarial multi-armed bandit models, there is typically no assumption imposed on t he sequence of rewards received by the player. In recent work, Lykouris et al. (2018) introduce a model in wh ich an adversary could corrupt the stochastic reward generated by an arm pull. They provide an algorithm and show that the regret of this "middle ground" scenario degrad es smoothly with the amount of corruption injected by the adversary. Gupta et al. (2019) pr esent an alternative algorithm which gives a significant improvement.

algorithm, artificial intelligence, big data, (19 more...)

arXiv.org Machine Learning

Sep-4-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found