Neural Contextual Bandits with Deep Representation and Shallow Exploration

Xu, Pan, Wen, Zheng, Zhao, Handong, Gu, Quanquan

Dec-3-2020–arXiv.org Machine Learning

Multi-armed bandits (MAB) (Auer et al., 2002; Audibert et al., 2009; Lattimore and Szepesvári, 2020) are a class of online decision-making problems where an agent needs to learn to maximize its expected cumulative reward while repeatedly interacting with a partially known environment. Based on a bandit algorithm (also called a strategy or policy), in each round, the agent adaptively chooses an arm, and then observes and receives a reward associated with that arm. Since only the reward of the chosen arm will be observed (bandit information feedback), a good bandit algorithm has to deal with the exploration-exploitation dilemma: tradeoff between pulling the best arm based on existing knowledge/history data (exploitation) and trying the arms that have not been fully explored (exploration). In many real-world applications, the agent will also be able to access detailed contexts associated with the arms. For example, when a company wants to choose an advertisement to present to a user, the recommendation will be much more accurate if the company takes into consideration the contents, specifications, and other features of the advertisements in the arm set as well as the profile of the user. To encode the contextual information, contextual bandit models and algorithms have been developed, and widely studied both in theory and in practice (Dani et al., 2008; Rusmevichientong

algorithm, deep learning, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

Dec-3-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Los Angeles County > Los Angeles (0.28)
  - Santa Clara County (0.14)

Genre:
- Research Report (0.64)

Industry:
- Energy > Oil & Gas > Upstream (0.48)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.89)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Reinforcement Learning (0.67)
      - Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found