Cooperative Multi-Agent Constrained Stochastic Linear Bandits

Afsharrad, Amirhossein, Oftadeh, Parisa, Moradipari, Ahmadreza, Lall, Sanjay

Oct-22-2024–arXiv.org Machine Learning

--In this study, we explore a collaborative multi-agent stochastic linear bandit setting involving a network of N agents that communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold τ . Each agent encounters a distinct linear bandit problem characterized by its own reward and cost parameters, i.e., local parameters. The goal of the agents is to determine the best overall action corresponding to the average of these parameters, or so-called global parameters. In each round, an agent is randomly chosen to select an action based on its current knowledge of the system. This chosen action is then executed by all agents, then they observe their individual rewards and costs. We propose a safe distributed upper confidence bound algorithm, so called MA-OPLB, and establish a high probability bound on its T -round regret. MA-OPLB utilizes an accelerated consensus method, where agents can compute an estimate of the average rewards and costs across the network by communicating the proper information with their neighbors. We also experimentally show the performance of our proposed algorithm in different network structures. Stochastic linear bandits have been widely researched in decision-making scenarios with a linear framework, such as recommendation systems or path routing [1], [2]. In these problems, at each time step, an agent selects an action and receives a corresponding random reward, which has an expected value that depends linearly on the context of the action. The agent's objective is to maximize the total reward over T rounds.

agent, algorithm, bandit problem, (13 more...)

arXiv.org Machine Learning

Oct-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California
    - Santa Cruz County > Santa Cruz (0.04)
    - Santa Clara County > Palo Alto (0.04)
    - Santa Barbara County > Santa Barbara (0.04)
- Africa > South Sudan
  - Equatoria > Central Equatoria > Juba (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.70)