Correlated Bandits for Dynamic Pricing via the ARC algorithm

Feb-8-2021–arXiv.org Machine Learning

In the multi-armed bandit problem, a decision maker needs to sequentially decide between acting to reveal data about a system and acting to generate profit. The central idea of the multi-armed bandit is that the agent has K'options' or equivalently, a bandit with K arms, and must choose which arm to play at each time. Playing an arm results in a reward generated from a fixed but unknown distribution which must be inferred'on-the-fly'. In the classic multi-armed bandit problem, the reward of each arm is assumed to be independent of the others (Gittins and Jones [9], Agrawal [1], Lattimore and Szepesvári [13]) and it is the only observation obtained by the decision maker at each step. In practice, we often observe signals in addition to the rewards and there is often correlation between the distributions of outcomes for different choices. For example, in a dynamic pricing problem (Dubé and Misra [6], Misra et al. [14]), an agent wants to fix the price of a single product, from a finite set of prices {c

algorithm, artificial intelligence, big data, (18 more...)

arXiv.org Machine Learning

Feb-8-2021

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found