Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs

Open in new window