CategorizedBandits
–Neural Information Processing Systems
In the multi-armed bandit problem, an agent has several possible decisions, usually referred to as "arms", and chooses or "pulls" sequentially one of them at each time step. This generates a sequence of rewards and the objective is to maximize their cumulative sum.
Neural Information Processing Systems
Feb-12-2026, 19:06:02 GMT
- Country:
- Europe
- Finland > Uusimaa
- Helsinki (0.05)
- Netherlands > North Holland
- Amsterdam (0.04)
- Finland > Uusimaa
- North America > Canada
- Europe
- Technology: