Non-Stationary Delayed Bandits with Intermediate Observations

Vernade, Claire, Gyorgy, Andras, Mann, Timothy

arXiv.org Machine Learning 

Delayed feedback in online learning have been addressed Online recommender systems often face long delays both in the full information setting (see, e.g., Joulani et al., in receiving feedback, especially when optimizing 2013, and the references therein), and in the bandit setting for some long-term metrics. While mitigating (see, e.g., Mandel et al., 2015; Vernade et al., 2017; Cesa-the effects of delays in learning is wellunderstood Bianchi et al., 2019, and the references therein), assuming in stationary environments, the problem both stochastic and adversarial environments. The main becomes much more challenging when the takeaway message from these studies is that, for bandits, environment changes. In fact, if the timescale the impact of a constant delay D results in an extra additive of the change is comparable to the delay, it is O( DT) term in the regret in adversarial settings, or an additive impossible to learn about the environment, since O(D) term in stochastic settings. The aforementioned the available observations are already obsolete.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found