Stochastic Bandits with Context Distributions
–Neural Information Processing Systems
We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution. The learner observes only the context distribution while the exact context realization remains hidden.
Neural Information Processing Systems
Aug-19-2025, 22:47:16 GMT