Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
–Neural Information Processing Systems
We further show the mismatched sampling paradox: A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior.
Neural Information Processing Systems
Nov-20-2025, 00:08:12 GMT
- Country:
- Europe > France (0.04)
- North America > United States (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: