Combinatorial Multi-Armed Bandit with General Reward Functions
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu
–Neural Information Processing Systems
In this paper, unless otherwise specified, we use MAB to refer to stochastic MAB. MAB problem demonstrates the key tradeoff between exploration and exploitation: whether the player should stick to the choice that performs the best so far, or should try some less explored alternatives that may provide better rewards.
Neural Information Processing Systems
Nov-21-2025, 08:45:38 GMT
- Country:
- Asia > China
- Europe
- Netherlands > South Holland
- Dordrecht (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Netherlands > South Holland
- North America > United States
- Texas > Travis County > Austin (0.04)
- Technology: