Non-Stationary Delayed Bandits with Intermediate Observations

Vernade, Claire, Gyorgy, Andras, Mann, Timothy

Aug-11-2020–arXiv.org Machine Learning

Delayed feedback in online learning have been addressed Online recommender systems often face long delays both in the full information setting (see, e.g., Joulani et al., in receiving feedback, especially when optimizing 2013, and the references therein), and in the bandit setting for some long-term metrics. While mitigating (see, e.g., Mandel et al., 2015; Vernade et al., 2017; Cesa-the effects of delays in learning is wellunderstood Bianchi et al., 2019, and the references therein), assuming in stationary environments, the problem both stochastic and adversarial environments. The main becomes much more challenging when the takeaway message from these studies is that, for bandits, environment changes. In fact, if the timescale the impact of a constant delay D results in an extra additive of the change is comparable to the delay, it is O( DT) term in the regret in adversarial settings, or an additive impossible to learn about the environment, since O(D) term in stochastic settings. The aforementioned the available observations are already obsolete.

algorithm, artificial intelligence, data mining, (15 more...)

arXiv.org Machine Learning

Aug-11-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Education (0.48)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.68)
  - Data Science > Data Mining (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found