Goto

Collaborating Authors

 exp3


Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Neural Information Processing Systems

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.







Supplement to " Model Selection in Contextual Stochastic Bandit Problems "

Neural Information Processing Systems

In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.




Supplementary materials for Paper " Bandit Samplers for Training Graph Neural Networks "

Neural Information Processing Systems

We show the convergences on validation in terms of timing (seconds) in Figure 1 and Figure 2. Basically, our algorithms converge to much better results in nearly same duration compared with Note that we cannot complete the training of AS-GA T on Reddit because of memory issues. Note that the comparisons of timing between "graph sampling" and "layer sampling" paradigms have As a result, we do not compare the timing with "graph sampling" approaches. That is, graph sampling approaches are designed for graph data that all vertices have labels. To summarize, the "layer sampling" approaches are more flexible and general compared with "graph sampling" Before we give the proof of Theorem 1, we first prove the following Lemma 1 that will be used later.