Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

Open in new window