Online Multi-Armed Bandits with Adaptive Inference

Neural Information Processing Systems 

During online decision making in multi-armed bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected-thereby yielding non-i.i.d.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found