Contextual Bandits with Large Action Spaces: Made Practical

Zhu, Yinglun, Foster, Dylan J., Langford, John, Mineiro, Paul

Jul-12-2022–arXiv.org Artificial Intelligence

We consider the design of practical, theoretically motivated algorithms for sequential decision making with contextual information, better known as the contextual bandit problem. Here, a learning agent repeatedly receives a context (e.g., a user's profile), selects an action (e.g., a news article to display), and receives a reward (e.g., whether the article was clicked). Contextual bandits are a useful model for decision making in unknown environments in which both exploration and generalization are required, but pose significant algorithm design challenges beyond classical supervised learning. Recent years have seen development on two fronts: On the theoretical side, extensive research into finite-action contextual bandits has resulted in practical, provably efficient algorithms capable of supporting flexible, general-purpose models (Langford and Zhang, 2007; Agarwal et al., 2014; Foster and Rakhlin, 2020; Simchi-Levi and Xu, 2021; Foster and Krishnamurthy, 2021). Empirically, contextual bandits have been widely deployed in practice for online personalization and recommendation tasks (Li et al., 2010; Agarwal et al., 2016; Tewari and Murphy, 2017; Cai et al., 2021), leveraging the availability of high-quality action slates (e.g., subsets of candidate articles selected by an editor). The developments above critically rely on the existence of a small number of possible decisions or alternatives. However, many applications demand the ability to make contextual decisions in large, potentially continuous spaces, where actions might correspond to images in a database or high-dimensional embeddings of rich documents such as webpages. Contextual bandits in large (e.g., million-action) settings remains a major challenge--both statistically and computationally--and constitutes a substantial gap between theory and practice. In particular: Existing general-purpose algorithms (Langford and Zhang, 2007; Agarwal et al., 2014; Foster and Rakhlin, 2020; Simchi-Levi and Xu, 2021; Foster and Krishnamurthy, 2021) allow for the use of flexible models (e.g., neural networks, forests, or kernels) to facilitate generalization across contexts, but have sample complexity and computational requirements linear in the number of actions.

algorithm, contextual bandit, optimal design, (14 more...)

arXiv.org Artificial Intelligence

Jul-12-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - Massachusetts > Hampshire County
    - Amherst (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found