ASPiRe: Adaptive Skill Priors for Reinforcement Learning

Neural Information Processing Systems 

We find that the sample size has almost no impact on the learning. Notice that the target KL divergence imposes on Ant Maze is higher than the one on Point Maze. "space" to explore around the composite skill prior. As target KL divergence increases, the learned policy will receive less guidance from the prior. The algorithm is not sensitive to this parameter.