Infinite Action Contextual Bandits with Reusable Data Exhaust

Rucker, Mark, Zhu, Yinglun, Mineiro, Paul

Jun-7-2023–arXiv.org Artificial Intelligence

Those who ignore history are doomed to repeat it. A modern variant of this truth arises in controlled experimentation platforms, where offline procedures are a critical complement to online tests, e.g., supporting counterfactual evaluation strategies (Agarwal et al., 2016), offline model selection (Li et al., 2015), and prioritization of scarce online experimental resources (Gomez-Uribe & Hunt, 2015). Consequently, the utility of a learning algorithm is not solely determined by online performance, but also by the post-hoc utility of the data exhaust. The recent contribution of Zhu & Mineiro (2022) exemplifies this: an online contextual bandit algorithm for infinite action spaces with O(1) space and time complexity with respect to the action set. Unfortunately, this performance is achieved by sampling from a distribution which is not absolutely continuous with the reference measure. Therefore, a variety of post-hoc evaluation procedures that rely on importance-weighting cannot be applied, limiting adoption. In this paper, we describe an alternative approach to infinite action spaces which not only enjoys similar smooth regret guarantee (and empirical performance), but also utilizes sampling distributions with well defined importance-weights. In exchange, we pay an increased computational cost.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Jun-7-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.68)
    - Statistical Learning (0.88)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found