Distributed Optimization via Kernelized Multi-armed Bandits

Rai, Ayush, Mou, Shaoshuai

arXiv.org Artificial Intelligence 

-- Multi-armed bandit algorithms provide solutions for sequential decision-making where learning takes place by interacting with the environment. In this work, we model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting. In this setup, the agents collabo-ratively aim to maximize a global objective function which is an average of local objective functions. The agents can access only bandit feedback (noisy reward) obtained from the associated unknown local function with a small norm in reproducing kernel Hilbert space (RKHS). We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels while preserving privacy. It does not necessitate the agents to share their actions, rewards, or estimates of their local function. In the proposed approach, the agents sample their individual local functions in a way that benefits the whole network by utilizing a running consensus to estimate the upper confidence bound on the global function. Furthermore, we propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network. It provides improved performance by utilizing a delay in the estimation update step at the cost of more communication. HE problem of distributed optimization deals with the optimization of a function over a network of agents in which the whole function is not completely known to any single agent [1], [2]. In fact, the "global" function can be expressed as an average of "local" functions associated with each agent which are independent of one another. In particular, our interest lies in the case when these local functions are non-convex, unknown, and expensive to compute or record. To form a feasible problem, we assume that these local functions belong to a reproducing kernel Hilbert space (RKHS), which is a very common assumption in the literature [3]- [5]. When dealing with unknown functions, the problem for each agent can be broken down into two segments: sampling and optimization.