Adversarial Online Learning with Temporal Feedback Graphs

Gatmiry, Khashayar, Schneider, Jon

arXiv.org Artificial Intelligence 

Prediction with expert advice is one of the most fundamental problems in online learning. In its simplest form, a learner must choose from one of K actions (possibly choosing a randomized mixture of actions) every round for T rounds. An adversary then reveals a loss vector containing the loss for each action to the learner, the learner incurs their appropriate loss, and play proceeds to the next round. The goal of the learner in such settings is usually to minimize their regret: the gap between their total utility at the end of the game and the maximum utility they could have received if they played the best fixed action in hindsight. Notably, it is possible to construct algorithms for the learner which achieve regret sublinear in T against any adversarially chosen sequence of losses. Traditionally, a learner may use the entire history of losses up until round t to decide their action at round t. In this paper, we investigate the question of what happens if we restrict the learner's action at time t to depend on the losses in some subset of these rounds. Formally, we require the learner's (randomized) action at time t to be a function of the losses in some subset of rounds S

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found