Stochastic Graph Bandit Learning with Side-Observations
–arXiv.org Artificial Intelligence
The bandit framework has garnered significant attention from the online learning community due to its widespread applicability in diverse fields such as recommendation systems, portfolio selection, and clinical trials [21]. Among the significant aspects of sequential decision making within this framework are side observations, which can be feedback from multiple sources [25] or contextual knowledge about the environment [1, 2]. These are typically represented as graph feedback and contextual bandits respectively. The multi-armed bandits framework with feedback graphs has emerged as a mature approach, providing a solid theoretical foundation for incorporating additional feedback into the exploration strategy [4, 7, 3]. The contextual bandit problem is another well-established framework for decisionmaking under uncertainty [20, 11, 1]. Despite the considerable attention given to non-contextual bandits with feedback graphs, the exploration of contextual bandits with feedback graphs has been limited [32, 30, 28].
arXiv.org Artificial Intelligence
Jan-6-2024
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Education (0.34)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)
- Technology: