Thompson Sampling for Multinomial Logit Contextual Bandits

Oct-2-2025, 12:52:45 GMT–Neural Information Processing Systems

The confidence set is updated based on the revenue feedback which is revealed after an arm is pulled. TS assumes a prior distribution over the parameters defining the reward distribution. At each step, a parameter value is sampled from the posterior distribution, and an optimal arm corresponding to a sampled parameter is pulled.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Oct-2-2025, 12:52:45 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology
  - Data Science > Data Mining (0.95)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
Thompson Sampling for Multinomial Logit Contextual Bandits

Similar Docs Excel Report more

Title	Similarity	Source
None found