10 Supplementary Material for the paper LeadCache Regret Optimal Caching in Networks by and
–Neural Information Processing Systems
Following Cohen and Hazan [2015] we derive a general expression for the regret upper bound applicable to any linear reward function under an anytime FTPL policy. This is accomplished in the following steps. First, we extend the argument of Cohen and Hazan [2015] to the anytime setting. Then, we specialize this bound to our problem setting. Recall the notations used in the paper - the aggregate file-request sequence from all users is denoted by {xt}t 1 and the virtual cache configuration sequence is denoted by {zt}t 1. Define the cumulative requests up to time tas: Xt = Furthermore, since the max function 14 is convex, we may interchange the expectation and gradient to obtain Φηt(Xt) =E(zt) [Bertsekas, 1973, Proposition 2.2]. Plugging in the expression of the inner product from Eqn. (25) in expression (26), we obtain: Bounding the term (a): Next, to upper bound the expected regret, we control term (a) in inequality (28).
Neural Information Processing Systems
Apr-25-2026, 03:10:06 GMT
- Technology: