Stochastic Bandits with Context Distributions

Neural Information Processing Systems 

We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution. The learner observes only the context distribution while the exact context realization remains hidden.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found