Transfer Q-learning
Chen, Elynn, Li, Sai, Jordan, Michael I.
–arXiv.org Artificial Intelligence
Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online $Q$-learning, integrating valuable insights from offline source studies. The proposed transfer $Q$-learning algorithm contains a novel {\em re-targeting} step that enables {\em cross-stage transfer} along multiple stages in an RL task, besides the usual {\em cross-task transfer} for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q^*$-function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under stage-wise reward similarity and mild design similarity across tasks. Empirical evidence from both synthetic and real datasets is presented to evaluate the proposed algorithm and support our theoretical results.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > Jordan (0.05)
- China > Beijing
- North America > United States
- California > Alameda County
- Berkeley (0.14)
- New York (0.04)
- California > Alameda County
- Asia
- Genre:
- Research Report (1.00)
- Workflow (0.67)
- Industry:
- Health & Medicine > Therapeutic Area (1.00)