Online Learning with Switching Costs and Other Adaptive Adversaries
–Neural Information Processing Systems
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize --in a nearly complete manner-- the power of adaptive adversaries with bounded memories and switching costs.
Neural Information Processing Systems
Mar-13-2024, 19:07:38 GMT
- Country:
- Europe (0.14)
- Genre:
- Research Report (0.47)
- Industry:
- Education > Educational Setting > Online (0.50)
- Technology: