Relational Weight Optimization for Enhancing Team Performance in Multi-Agent Multi-Armed Bandits
Kotturu, Monish Reddy, Movahed, Saniya Vahedian, Robinette, Paul, Jerath, Kshitij, Redlich, Amanda, Azadeh, Reza
–arXiv.org Artificial Intelligence
Using a graph to represent the team behavior ensures that the relationship between Multi-Armed Bandits (MABs) are a class of reinforcement the agents are held. However, existing works either do learning problems where an agent is presented with a set of not consider the weight of each relationship (graph edges) arms (i.e., actions), with each arm giving a reward drawn (Madhushani and Leonard, 2020; Agarwal et al., 2021) or from a probability distribution unknown to the agent expect the user to manually set those weights (Moradipari (Lattimore and Szepesvári, 2020). The goal of the agent et al., 2022). is to maximize its total reward which requires balancing In this paper, we propose a new approach that combines exploration and exploitation. MABs offer a simple model graph optimization and MAMAB algorithms to enhance to simulate decision-making under uncertainty. Practical team performance by expediting the convergence to consensus applications of MAB algorithms include news recommendations of arm means. Our proposed approach: (Yang and Toni, 2018), online ad placement (Aramayo et al., 2022), dynamic pricing (Babaioff et al., 2015), improves team performance by optimizing the edge and adaptive experimental design (Rafferty et al., 2019). In weights in the graph representing the team structure contrast to single-agent cases, in certain applications such in large constrained teams, as search and rescue, a team of agents should cooperate does not require manual tuning of the graph weights, with each other to accomplish goals by maximizing team is independent of the MAMAB algorithm and only performance. Such problems are solved using Multi-Agent depends on the consensus formula, and Multi-Armed Bandit (MAMAB) algorithms (Xu et al., formulates the problem as a convex optimization, which 2020). Most existing algorithms rely on the presence of is computationally efficient for large teams.
arXiv.org Artificial Intelligence
Oct-30-2024
- Country:
- North America > United States
- Massachusetts > Middlesex County
- Lowell (0.15)
- Texas > Bexar County
- San Antonio (0.14)
- Massachusetts > Middlesex County
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.34)
- Technology: