OnlineMulti-ArmedBanditswithAdaptiveInference
–Neural Information Processing Systems
During online decision making in multi-armed bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step.
Neural Information Processing Systems
Feb-7-2026, 12:26:49 GMT