Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik

Nov-13-2022–arXiv.org Artificial Intelligence

We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of applications.

artificial intelligence, big data, data mining, (18 more...)

arXiv.org Artificial Intelligence

Nov-13-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
- Europe > Netherlands
  - North Brabant > Eindhoven (0.06)

Genre:
- Research Report (0.50)
- Instructional Material > Online (0.46)

Industry:
- Education
  - Educational Setting > Online (1.00)
  - Educational Technology > Educational Software
    - Computer Based Training (0.93)

Technology:
- Information Technology
  - Artificial Intelligence (1.00)
  - Enterprise Applications > Human Resources
    - Learning Management (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found