Transfer Q-learning

Chen, Elynn, Li, Sai, Jordan, Michael I.

Oct-21-2025–arXiv.org Artificial Intelligence

Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online $Q$-learning, integrating valuable insights from offline source studies. The proposed transfer $Q$-learning algorithm contains a novel {\em re-targeting} step that enables {\em cross-stage transfer} along multiple stages in an RL task, besides the usual {\em cross-task transfer} for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q^*$-function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under stage-wise reward similarity and mild design similarity across tasks. Empirical evidence from both synthetic and real datasets is presented to evaluate the proposed algorithm and support our theoretical results.

machine learning, reinforcement learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

Oct-21-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.47)
- North America > United States
  - California (0.28)

Genre:
- Research Report (1.00)
- Workflow (0.67)

Industry:
- Health & Medicine > Therapeutic Area (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found