Thompson Sampling for Multinomial Logit Contextual Bandits
–Neural Information Processing Systems
The confidence set is updated based on the revenue feedback which is revealed after an arm is pulled. TS assumes a prior distribution over the parameters defining the reward distribution. At each step, a parameter value is sampled from the posterior distribution, and an optimal arm corresponding to a sampled parameter is pulled.
Neural Information Processing Systems
Oct-2-2025, 12:52:45 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Asia > Middle East
- Technology: