NeurIPS2021_ImperfectCommmunicationBandits

Madhu

Neural Information Processing Systems 

We consider the case where each message fails with probability 1 p and each agent i uses the messages it receives from its neighbors with probability pi.This is equivalent to each agent ireceiving messages from its neighbors with probability pip.Let 1{(i,j) 2 Et}be the indicator random variable that takes value 1 if agent i receives reward value and arm id from agent j at time t and 0 otherwise. We start by proving some useful lemmas. Lemma 1. (Restatement of results from [3]) Let k = Thus we have P Ai(t+1) = k,Nik(t) > k P bµi1(t) µ1 Ci1(t) +P bµik(t) µk +Cik(t) This concludes the proof of Lemma 1. Lemma 2. Let (G) is the clique covering number of graph G. Let k = Let C be a non overlapping clique covering of G. Then we have that k |C| < Nik( ik,C) k. From regret results it follows that regret for this case is greater than the regret for the case where ik,C < k,C for some (or all) i. 13 We analyse the expected number of times agents pull suboptimal arm k as follows, X P bµi1(t) µ1 Ci1(t) +P bµik(t) µk +Cik(t), (29) where (a) follows from the fact that clique covering is non overlapping. This concludes the proof of Lemma 2. Lemma 3. Let di(G) be the degree of agent i in graph G.