On the optimal regret of collaborative personalized linear bandits

Huang, Bruce, Zhou, Ruida, Yang, Lin F., Diggavi, Suhas

Jun-23-2025–arXiv.org Artificial Intelligence

Stochastic linear bandits are a fundamental model in sequential decision-making, where in each round, an agent selects a vector-valued action and receives a stochastic reward with expectation equal to the inner product between the action and an unknown parameter vector [14]. This setting has been extensively studied in the single-agent case, where the goal is to maximize cumulative reward over time through efficient exploration and exploitation [19, 12, 1]. In this paper, we consider a generalized multi-agent setting, motivated by many practical scenarios such as distributed decision systems [13, 24], personalized adaptive interfaces [16, 11], and multi-device learning [4]--where in each round, a set of m agents must each select an action and receive feedback from a stochastic, potentially heterogeneous reward function. A natural baseline approach is to run a standard single-agent linear bandit algorithm independently for each agent. While straightforward, this approach entirely ignores potential similarities across agents. When the agents' environments are similar, such independent learning leads to redundant exploration and a suboptimal use of data. In contrast, our goal is to develop collaborative learning algorithms that adaptively share information across agents--without knowing in advance how similar or different they are. This raises a fundamental question: How can collaboration reduce regret under unknown heterogeneity? To address the question, we model heterogeneity using a hierarchical/empirical Bayesian framework, where each agent's unknown parameter vector is drawn from an unknown population distribution.

artificial intelligence, heterogeneity, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jun-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found