Offline Learning for Combinatorial Multi-armed Bandits

Liu, Xutong, Dai, Xiangxiang, Zuo, Jinhang, Wang, Siwei, Wong, Carlee-Joe, Lui, John C. S., Chen, Wei

Jan-31-2025–arXiv.org Artificial Intelligence

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs of online interactions and the readily available offline datasets. To overcome these limitations, we introduce Off-CMAB, the first offline learning framework for CMAB. Central to our framework is the combinatorial lower confidence bound (CLCB) algorithm, which combines pessimistic reward estimations with combinatorial solvers. To characterize the quality of offline datasets, we propose two novel data coverage conditions and prove that, under these conditions, CLCB achieves a near-optimal suboptimality gap, matching the theoretical lower bound up to a logarithmic factor. We validate Off-CMAB through practical applications, including learning to rank, large language model (LLM) caching, and social influence maximization, showing its ability to handle nonlinear reward functions, general feedback models, and out-of-distribution action samples that excludes optimal or even feasible actions. Extensive experiments on synthetic and real-world datasets further highlight the superior performance of CLCB.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Jan-31-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Portugal > Porto
    - Porto (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - India > Karnataka
    - Bengaluru (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report (0.81)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning
      - Reinforcement Learning (1.00)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found