Reviews: Recovering Bandits

Jan-25-2025, 23:18:58 GMT–Neural Information Processing Systems

The major point that needs to be clarified for me is the distinction between single play regret and multiple play regret. The UCB-Z algorithm is not of this type for example. It is a bit weird to have regret notion that only apply to very specific algorithms - second, regarding the distinction between single and multiple play regret, am I right that E[R_T {(d,m)}] E[R_T {(d)}]? That is, you would in principle care about the multiple play lookahead regret, as one should be allowed to be play an arm multiple times during the d-time step on which we optimize. From my understanding, it would be defined only for strategies which selects each arm at most once during d steps (and therefore d cannot be larger than K in this case, right?).

algorithm, play regret, recovery function, (7 more...)

Neural Information Processing Systems

Jan-25-2025, 23:18:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.33)