Goto

Collaborating Authors

 Uncertainty


EnsembleSampling_Final

Neural Information Processing Systems

Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem.


Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Neural Information Processing Systems

In many sequential decision making settings, the agent lacks complete information about the underlying state of the system, a phenomenon known as partial observability .