Neural Thompson Sampling

Zhang, Weitong, Zhou, Dongruo, Li, Lihong, Gu, Quanquan

Oct-2-2020–arXiv.org Machine Learning

The stochastic multi-armed bandit (Bubeck and Cesa-Bianchi, 2012; Lattimore and Szepesvári, 2020) has been extensively studied, as an important model to optimize the tradeoff between exploration and exploitation in sequential decision making. Among its many variants, the contextual bandit is widely used in real-world applications such as recommendation (Li et al., 2010), advertising (Graepel et al., 2010), robotic control (Mahler et al., 2016), and healthcare (Greenewald et al., 2017). In each round of a contextual bandit, the agent observes a feature vector (the "context") for each of the K arms, pulls one of them, and in return receives a scalar reward. The goal is to maximize the cumulative reward, or minimize regret (to be defined later), in a total of T rounds. To do so, the agent must find a tradeoff between exploration and exploitation. One of the most effective and widely used techniques is Thompson Sampling, or TS (Thompson, 1933). The basic idea is to compute the posterior distribution of each arm being optimal for the present context, and sample an arm from this distribution. TS is often easy to implement, and has found great success in practice (Chapelle and Li, 2011; Graepel et al., 2010; Kawale et al., 2015; Russo et al., 2017). Recently, a series of work has applied TS or its variants to explore in contextual bandits with neural network models (Blundell et al., 2015; Kveton et al., 2020; Lu and Van Roy, 2017; Riquelme

neural network, thompson sampling, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

Oct-2-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:
- Research Report (0.64)

Industry:
- Energy > Oil & Gas > Upstream (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found