Delay-Adaptive Learning in Generalized Linear Contextual Bandits

Blanchet, Jose, Xu, Renyuan, Zhou, Zhengyuan

Mar-11-2020–arXiv.org Machine Learning

The growing availability of user-specific data has welcomed the exciting era of personalized recommendation, a paradigm that uncovers the heterogeneity across individuals and provides tailored service decisions that lead to improved outcomes. Such heterogeneity is ubiquitous across a variety of application domains (including online advertising, medical treatment assignment, product/news recommendation ([29], [9],[11],[7],[42])) and manifests itself as different individuals responding differently to the recommended items. Rising to this opportunity, contextual bandits ([8, 39, 22, 1, 3]) have emerged to be the predominant mathematical formalism that provides an elegant and powerful formulation: its three core components, the features (representing individual characteristics), the actions (representing the recommendation), and the rewards (representing the observed feedback), capture the salient aspects of the problem and provide fertile ground for developing algorithms that balance exploring and exploiting users' heterogeneity. As such, the last decade has witnessed extensive research efforts in developing effective and efficient contextual bandits algorithms. In particular, two types of algorithms-upper confidence bounds (UCB) based algorithms ([29, 20, 15, 26, 30]) and Thompson sampling (TS) based algorithms ([4, 5, 40, 41, 2])-stand out from this flourishing and fruitful line of work: their theoretical guarantees have been analyzed in many settings, often yielding (near-)optimal regret bounds; their empirical performance have been thoroughly validated, often providing insights into their practical efficacy (including the consensus that TS based algorithms, although sometimes suffering from intensive computation for posterior updates, are generally more effective than their UCB counterparts, whose performance can be sensitive to hyper-parameter tuning). To a large extent, these two family of algorithms have been widely deployed in many modern recommendation engines.

algorithm, bandit, contextual bandit, (14 more...)

arXiv.org Machine Learning

Mar-11-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Hawaii (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.14)
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.55)
- Information Technology > Services (0.48)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.66)
  - Artificial Intelligence
    - Representation & Reasoning > Personal Assistant Systems (0.54)
    - Machine Learning > Learning Graphical Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found