Skill-Critic: Refining Learned Skills for Reinforcement Learning
Hao, Ce, Weaver, Catherine, Tang, Chen, Kawamoto, Kenta, Tomizuka, Masayoshi, Zhan, Wei
–arXiv.org Artificial Intelligence
Incorporating prior experience by learning from demonstration can facilitate efficient exploration in complex environments Figure 1: Our Skill-Critic approach leverages lowcoverage [9]. For example, statistical methods demonstrations to facilitate hierarchical can infer the hidden structure of offline data and reinforcement learning by (1) acquiring a basic inform the decision-making process [6, 7]. However, skill-set from demonstrations that (2) guides learning offline data alone may not suffice for determining online skill selection and skill improvement an optimal policy, particularly when the data originates from simpler environments or pertains to intricate or stochastic tasks. In such cases, online policy optimization is imperative to refine suboptimal policies. In this work, we present a hierarchical RL framework that can leverage offline data to accelerate RL training without limiting its performance by the quality of offline data. Our framework employs skills, temporally extended sequences of primitive actions [10]. Previous works extract skills from unstructured data and transfer them to downstream RL tasks with a skill selection policy whose action space is the skill itself [11].
arXiv.org Artificial Intelligence
Jun-15-2023
- Country:
- North America > United States > California > Alameda County > Berkeley (0.14)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Education (0.83)
- Technology: