Online Multi-Armed Bandits with Adaptive Inference
–Neural Information Processing Systems
During online decision making in multi-armed bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected-thereby yielding non-i.i.d.
Neural Information Processing Systems
Apr-24-2026, 17:29:38 GMT