Contextual Bandits with Large Action Spaces: Made Practical
Zhu, Yinglun, Foster, Dylan J., Langford, John, Mineiro, Paul
–arXiv.org Artificial Intelligence
We consider the design of practical, theoretically motivated algorithms for sequential decision making with contextual information, better known as the contextual bandit problem. Here, a learning agent repeatedly receives a context (e.g., a user's profile), selects an action (e.g., a news article to display), and receives a reward (e.g., whether the article was clicked). Contextual bandits are a useful model for decision making in unknown environments in which both exploration and generalization are required, but pose significant algorithm design challenges beyond classical supervised learning. Recent years have seen development on two fronts: On the theoretical side, extensive research into finite-action contextual bandits has resulted in practical, provably efficient algorithms capable of supporting flexible, general-purpose models (Langford and Zhang, 2007; Agarwal et al., 2014; Foster and Rakhlin, 2020; Simchi-Levi and Xu, 2021; Foster and Krishnamurthy, 2021). Empirically, contextual bandits have been widely deployed in practice for online personalization and recommendation tasks (Li et al., 2010; Agarwal et al., 2016; Tewari and Murphy, 2017; Cai et al., 2021), leveraging the availability of high-quality action slates (e.g., subsets of candidate articles selected by an editor). The developments above critically rely on the existence of a small number of possible decisions or alternatives. However, many applications demand the ability to make contextual decisions in large, potentially continuous spaces, where actions might correspond to images in a database or high-dimensional embeddings of rich documents such as webpages. Contextual bandits in large (e.g., million-action) settings remains a major challenge--both statistically and computationally--and constitutes a substantial gap between theory and practice. In particular: Existing general-purpose algorithms (Langford and Zhang, 2007; Agarwal et al., 2014; Foster and Rakhlin, 2020; Simchi-Levi and Xu, 2021; Foster and Krishnamurthy, 2021) allow for the use of flexible models (e.g., neural networks, forests, or kernels) to facilitate generalization across contexts, but have sample complexity and computational requirements linear in the number of actions.
arXiv.org Artificial Intelligence
Jul-12-2022
- Country:
- North America > United States
- Wisconsin > Dane County
- Madison (0.04)
- Massachusetts > Hampshire County
- Amherst (0.04)
- Wisconsin > Dane County
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: