QueueUpYourRegrets: AchievingtheDynamic CapacityRegionofMultiplayerBandits

Neural Information Processing Systems 

Our main observation is that the gap between λnt and the accumulated reward of agent n, which we call the QoS regret, behaves like a queue.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found