Variational Quantum Circuits in Offline Contextual Bandit Problems

Schulte, Lukas, Hein, Daniel, Udluft, Steffen, Runkler, Thomas A.

arXiv.org Artificial Intelligence 

Abstract--This paper explores the application of variational quantum circuits (VQCs) for solving offline contextual bandit problems in industrial optimization tasks. Using the Industrial Benchmark (IB) environment, we evaluate the performance of quantum regression models against classical models. Our findings demonstrate that quantum models can effectively fit complex reward functions, identify optimal configurations via particle swarm optimization (PSO), and generalize well in noisy and sparse datasets. These results provide a proof of concept for utilizing VQCs in offline contextual bandit problems and highlight their potential in industrial optimization tasks. Contextual bandit algorithms have emerged as powerful tools for decision-making under uncertainty. Driven by the increasing demand for personalization and adaptive decision-making, contextual bandits have been widely adopted in various domains, including recommender systems [1], [2], online advertising [3], and healthcare [4], where decisions must be made based on contextual information to maximize user engagement, click-through rates, or patient outcomes. In industrial applications, where systems must be continuously tuned or "steered" for optimal performance, contextual bandits offer a powerful approach to optimizing system configurations. In these settings, decisions need to be made based on contextual information (e.g., current operational state or environmental conditions), and the overall objective is to maximize some notion of reward (e.g., production throughput, energy efficiency, or product quality).