Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
–Neural Information Processing Systems
However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" µ close to the optimal policy π
Neural Information Processing Systems
Mar-22-2025, 19:02:37 GMT
- Genre:
- Instructional Material > Online (0.40)
- Industry:
- Leisure & Entertainment > Games (0.92)
- Technology: