Online Learning with Switching Costs and Other Adaptive Adversaries

Cesa-Bianchi, Nicolo, Dekel, Ofer, Shamir, Ohad

Jun-1-2013–arXiv.org Machine Learning

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $\widetilde{\Theta}(T^{2/3})$. Interestingly, this rate is significantly worse than the $\Theta(\sqrt{T})$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $\widetilde{\Theta}(T^{2/3})$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

Jun-1-2013

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education > Educational Setting > Online (0.40)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.47)
  - Enterprise Applications > Human Resources
    - Learning Management (0.40)
  - Data Science > Data Mining
    - Big Data (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found