Adaptive Policy Backbone via Shared Network
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) has demonstrated impressive ability to learn high-performing policies across diverse domains, including gaming (Mnih et al., 2015; Vinyals et al., 2019; Berner et al., 2019), robotics (Levine et al., 2016; Akkaya et al., 2019), traffic control (El-Tantawy et al., 2013), and aligning large language models with human preferences (Stiennon et al., 2020). However, learning a high-performing policy in RL typically requires extensive data collection, thereby discouraging practical deployment. To alleviate the burden of extensive data collection, recent work has explored leveraging priors, either via a pre-collected dataset (Levine et al., 2020) or a reference policy (Xie et al., 2021; Kalashnikov et al., 2018). In practice, however, the deployment task often differs from that represented by the dataset or reference policy; such task mismatch can substantially diminish the utility of these priors. To leverage priors despite this mismatch, several approaches have been proposed in the context of meta-RL (Wang et al., 2016; Duan et al., 2016; Finn et al., 2017; Rakelly et al., 2019), which aim to leverage prior knowledge for efficient adaptation, either by (i) improving the sample efficiency of standard RL or by (ii) enabling rapid adaptation to new tasks.
arXiv.org Artificial Intelligence
Sep-29-2025
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Transportation (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Representation & Reasoning (1.00)
- Natural Language (0.86)
- Machine Learning
- Reinforcement Learning (1.00)
- Neural Networks (0.93)
- Information Technology > Artificial Intelligence