Stochastic Graph Bandit Learning with Side-Observations

Jan-6-2024–arXiv.org Artificial Intelligence

The bandit framework has garnered significant attention from the online learning community due to its widespread applicability in diverse fields such as recommendation systems, portfolio selection, and clinical trials [21]. Among the significant aspects of sequential decision making within this framework are side observations, which can be feedback from multiple sources [25] or contextual knowledge about the environment [1, 2]. These are typically represented as graph feedback and contextual bandits respectively. The multi-armed bandits framework with feedback graphs has emerged as a mature approach, providing a solid theoretical foundation for incorporating additional feedback into the exploration strategy [4, 7, 3]. The contextual bandit problem is another well-established framework for decisionmaking under uncertainty [20, 11, 1]. Despite the considerable attention given to non-contextual bandits with feedback graphs, the exploration of contextual bandits with feedback graphs has been limited [32, 30, 28].

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jan-6-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.66)

Industry:
- Education (0.34)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Personal Assistant Systems (0.48)
  - Data Science > Data Mining
    - Big Data (0.69)