Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Mar-22-2025, 19:02:37 GMT–Neural Information Processing Systems

However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" µ close to the optimal policy π

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Mar-22-2025, 19:02:37 GMT

Conferences PDF

Add feedback

Genre:
- Instructional Material > Online (0.40)

Industry:
- Leisure & Entertainment > Games (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)
  - Robots (0.92)