Overleaf Example
–Neural Information Processing Systems
We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of K arms to change over time without restriction. Assuming the d-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of T rounds is known to scale as Õ( KdT).
Neural Information Processing Systems
Feb-11-2025, 10:40:54 GMT