Adaptive Policy Backbone via Shared Network

Sep-29-2025–arXiv.org Artificial Intelligence

Reinforcement learning (RL) has demonstrated impressive ability to learn high-performing policies across diverse domains, including gaming (Mnih et al., 2015; Vinyals et al., 2019; Berner et al., 2019), robotics (Levine et al., 2016; Akkaya et al., 2019), traffic control (El-Tantawy et al., 2013), and aligning large language models with human preferences (Stiennon et al., 2020). However, learning a high-performing policy in RL typically requires extensive data collection, thereby discouraging practical deployment. To alleviate the burden of extensive data collection, recent work has explored leveraging priors, either via a pre-collected dataset (Levine et al., 2020) or a reference policy (Xie et al., 2021; Kalashnikov et al., 2018). In practice, however, the deployment task often differs from that represented by the dataset or reference policy; such task mismatch can substantially diminish the utility of these priors. To leverage priors despite this mismatch, several approaches have been proposed in the context of meta-RL (Wang et al., 2016; Duan et al., 2016; Finn et al., 2017; Rakelly et al., 2019), which aim to leverage prior knowledge for efficient adaptation, either by (i) improving the sample efficiency of standard RL or by (ii) enabling rapid adaptation to new tasks.

arxiv preprint arxiv, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.68)

Industry:
- Transportation (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (0.86)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks (0.93)