hi 1
NeurIPS2021_ImperfectCommmunicationBandits
We consider the case where each message fails with probability 1 p and each agent i uses the messages it receives from its neighbors with probability pi.This is equivalent to each agent ireceiving messages from its neighbors with probability pip.Let 1{(i,j) 2 Et}be the indicator random variable that takes value 1 if agent i receives reward value and arm id from agent j at time t and 0 otherwise. We start by proving some useful lemmas. Lemma 1. (Restatement of results from [3]) Let k = Thus we have P Ai(t+1) = k,Nik(t) > k P bยตi1(t) ยต1 Ci1(t) +P bยตik(t) ยตk +Cik(t) This concludes the proof of Lemma 1. Lemma 2. Let (G) is the clique covering number of graph G. Let k = Let C be a non overlapping clique covering of G. Then we have that k |C| < Nik( ik,C) k. From regret results it follows that regret for this case is greater than the regret for the case where ik,C < k,C for some (or all) i. 13 We analyse the expected number of times agents pull suboptimal arm k as follows, X P bยตi1(t) ยต1 Ci1(t) +P bยตik(t) ยตk +Cik(t), (29) where (a) follows from the fact that clique covering is non overlapping. This concludes the proof of Lemma 2. Lemma 3. Let di(G) be the degree of agent i in graph G.
Supplementary proofs from Section 2
We begin with a simple lemma showing that the values of the levels are monotone: Lemma A.1. First, we note that the second part of the lemma holds by lines 15-16. Let zil and zih be the value of zland zhin Algorithm 2 on line 9 on window i. There are two cases, depending on whether an element e? was added to the solutions or not. Suppose no element e? was added to the solution. Then all the levels remain the same.